**Matthew C Scott**

**today**

In this paper I present a summary of my current research in the development of a system which attempts to find the optimal matching of two general directed graphs. The problem is intended to model a parallel processing environment such that one graph represents the data-dependencies between a system of concurrent processes, and the other graph represents the parallel hardware topology upon which we wish to distribute these processes. Besides its obviouse practical applications, it also provides a basis for analysis of some theoretical aspects of Genetic Algorithms. In particular:

- Consideration of epistasis and preservation of dependency knowledge through the phenotype->genotype mapping. The graph-matching problem provides a perfect model for such analysis in that the dependencies we wish our GA's to discover are simply the weights of the edges between nodes of the graphs. In other models such dependencies are generally unknown. It seems that all such models, if partitionable into finite elements, would be reduced to such a graph anyway to define the inherent dependencies.

- Consideration of complexities of graph problems and the degree to which local search may be used in conjuction with a global GA. Here I am mostly concerned with the topology of the problem and the behaviour of the GA in the space. Deceptional graph-matching problems are developed, some customized simulated annealing paradigms are exploited and use of speciation and nearest-neighbor mating are examined. The local search employs a GA which partitions the graphs in order to reduce the space and avoid premature convergence.

Keywords: Asynchronous Parallel Genetic Algorithms Load Balancing Connection optimization K-way Graph Partitioning Autonymous Genetic Algorithms

(* Outline*) (* A*)
Introduction: (* I*) Description of GAs.... (* II*) Problem of Epistasis.
(* a*) Reordering operators: Condensed
description of GARF (* b*) Weighting of dependencies: Concise,
intelligible description of partition of the dependency space according to
degrees of density (* as describe in prior paper*). (*
c*) Masking: Using gradient information in To: From: Subject: (* d*)
Decomposing problem for local search: Parametrize a
multidimensional space into multiple, fairly independent functions. Attack each
function with independent GA populations. Surviving strings from each
population provide information that may be incorporated by higher level GAs. For example, the K-way partitioning problem returns
sets of nodes with a greater than average degree of connectivity (* note** that this equates to minimization of
connectivity between sets*). Apply a similar partitioning to the program
topology. A higher level GA may then determine a matching between the set-graph
of the hardware and that of the software. The resulting strings representing
the matching of partition sets may then have the entire process applied to each
pair of matched sets, and so forth in a recursive fashion until remaining sets
are sigular processors with assigned programs.

(* III*) Relation to Neural Nets. (* a*) Simulated annealing and the boltzman
machine. (* b*) Constrained mean field nets for
Graph Matching [Simic91, Neural Computation 3,268-281] Objective function,
effective cost.

(* c*) Genetic Neural Networks.

(* B*) Analysis: (* I*) Graph analysis of
related problems: (* a*) TSP solutions K-opt local optimization. Edge
representation vs Node representation (* b*) Job Shop Scheduling (* JSS*). (* 1*) Show how a JSS may be reduced to a tiling or
matching problem. (* 2*) Show how directed graph
matching relates to JSS if one breaks the software dependency net into

(* c*) K-way graph partitioning GAs and algorithms. Neighbors (*
hardware*) only reproduction (* d*) Graph coloring Allowing
temporary invalid colorings to smooth the search space. The autonymous
GA system does this implicitly.

(* II*) Degrees of Complexity, GA-Hardness.
Discuss proportionality of local search and Blind-GA global search. For the
follow tasks (* in** consideration of the
problem space*): What factors are amenable to local search? Given a graph
with an average K-degree, ie (* n**-1*)Prob, what should the population size be, what rate of
convergence, mutation.

Testbed: (* a*) Find
a match of two simple isomorphic graphs. (* b*)
Randomly weight the edges (* still isomorphic, but reduces possibility of
many symmetric matches*).

(* c*) Let the graphs be non-isomorphic.

(* d*) Allow graphs to have non-equal number
of nodes (* allow one-to-many pairings*).

(* e*) Incorporate data dependencies between
programs (* directed graph*).

(* f*) Let the GA system try to create one-way
loops in the processor graph in order to reduce conflict and possibility of
deadlock.

(* g*) Allow nodes to be different sizes. For
example, let the processor nodes have sizes (*
computational** speed/ local memory*) between 1 and 3 such that
they may accomodate that many standard program nodes.
The program nodes may also have variable sizes (* integer**
<= max processor size*).

(* h**?*) Let the hardware topology be
unknown. This option, of course, does not provide the ability to do local
search as in the multi-level system above, but (* I**
beleive*) it is still possible if given autonomous
GAs. (* expound**
upon dynamic autonmous GA system*)

(* C*) Implementations: (* I*)

(* II*) Large populations of autonymous GAs: Each GA species
represents a program. Each member only has knowledge of addresses of a subset
of the members of other species to which its own species/program must
communicate. A node determines its relative fitness of position on a particular
processor by sending messages to the nodes with which it (*
being** a representative of the real program*) must communicate.
By measuring times of reply the node may determine the 'direction' of
attraction. That is, it would migrate to a neighbor processor such that one of
the reply times would be reduced (* which**? Hmmm).* It may not
actually know the direction of attraction but it can tell after migrating
whether the situation has improved, and thus backtrack if necessary. Note that
this sort of memory provides a simple gradient with which it may guide future
movements. There should be a limit on the number of nodes allowed on a
processor. Otherwise the whole system would eventually converge upon one. Let
each processor keep a population count. A nodes fitness/comfort on a processor
will have a factor inversely proportional to the population size on that
processor. Thus, a remote, lone processor with one connection to the net may
not provide a good mapping for any of the program nodes, but it will probably
still be occupied as it has much processing power to lend. Likewise, a node
will have a hard time immigrating to an overpopulated processor. If a processor
has a large complement of a particular species, it is safe to assume that that
is an optimal position for them. But, it would not hurt to take a number of the
representative members and send them off into the wild unknown. They may find a
position wherein they have a better fitness. But they will still be pulling on
the same nodes of other species (* those** to
which their representative program must communicate*) as the original
cluster is. Thus their new position must be relatively good if they are to
survive. There needs to be some kind of function by which a node gives up, dies
and reincarnates itself somewhere else. Thus it seems that between competition
periods there should be a reporting period in which each processor determines
the number and average fitness of each species resident in it, and that it
should report this to a central processor. The central processor would have a
table for current and last generation in which it tabulates this data.
Statistics may be computed and broadcast back to all the processors. Thus, a
particular node, knowing its own history of fitness in a particular area, would
be able to determine if it had stagnated in a less than optimal area. Besides
migrating, local search may be executed by pairs of nodes by swapping pointers
to nodes of other species. If one considers the attraction that node has towards
the nodes of other species members to which it has pointers, then one can
conceive that these attractions combine to form an acceleration which is a sum
of the vector components. Swapping nodes changes the acceleration (* direction** and degree of attraction*). After an letting the system run for a while attractors will form.
Using the techniques of simulated annealing, the energy of the system is
gradually reduced by removing less fit nodes and reducing the average number of
nodes that each processor may accomodate. Eventually
we have a fairly optimal mapping. This mapping may be saved and the system may
be run again to find another mapping. After N runs we either choose the best (* according** to an effective energy/objective
function measurement*) or we may toss the saved optimal mappings back into
the system and possibly get further improvements. [Include defense of
definition of this system as being a GA]

(* D*) Results:

(* E*) Conclusions:

**Review of the Quadractic
Assignment Problem**

The Quadratic Assignment Problem of order n is the search of a mapping of n activities to n locations. It was first described by (Koopmans and Beckman, 1957). Originally it was devised to solve the Job Shop Scheduling problem in which items at each station where transfered periodically to other stations, and the transfer had varying costs according to the item type. Thus, given a limited number of stations, assign the items to them in order to minimize the total average cost of tranfers.

D = [d_{ih}] = matrix of distances (location i to location h)....

F = [f_{jk}] = matrix of flows (item j to item k) .....................

C = [c_{ij}] = matrix of assignment costs (item i to location j)

Now, simply find a permutation : i->(i) which is a particular assignment of item j = (i) to station i (i = 1,...,n), such that it minimizes the cost of the flow (D*F)*C. As the cost function C is easily applied, it is not presented in the equations.

min z = {i,h=1 -> n} d_{ih}*f_{(i)(h)}

This is reformulated to a quadratic function by
introducing a permutation matrix X of dimension n×n, such that each x_{ij}
= 1 if item j is assigned to station i and 0 otherwise.
Thus:

min z = {i,j=1 -> n} {h,k=1 -> n} d_{ih}* f_{jk}* x_{ij}* x_{hk}

where the x_{ij} are such that

{i=1 -> n}( x_{ij})<=1 (j = 1, ..., n) : No more than 1 item per station

{j=1 ->
n}( x_{ij})=1 (i = 1, ..., n) : Every item is assigned

each x_{ij} an element of (0,1)

*Figure 3.** QAP, D,F,C example*

This algorithm is feasible for up to about n=20. Typically a Branch-and-Bound method is employed to search the permutations on X, and it is known to be NP-complete by transformation to the Traveling Salescreature Problem. We will emply this algorithm below in the development of various autonomous agents.

**Review of Gentic Algorithms for
Combinatorial Optimization**

The methods of evolutionary computation were first proposed by [J. H. Holand (1962)] and the first published reference to Genetic Algorithms is in [J. H. Holland (1973)]. Borrowing from the mechanics of natural selection and genetics, Genectic Algorithms execute objective function optimization through cross-over reproduction techniques and survival of the fitest selection algorithms. Basically the algorithm entails:

- devise a string representation of the function to be optimized (the genotype),
- create a population of these strings using an initially random distribution,
- evaluate the strings under the objective function (as a phenotype),
- choose a percentage of the most fit strings for propagation into the next generation,
- invoke cross-over reproduction between randomly selected pairs of the surviving strings until the population is restored, (add mutations will some small probability)
- repeat to step 3.

The primary advantages of GAs are their simplicity, robustness and ability to exploit implicit parallelism. And "Furthermore, they are not fundamentally limited by restrictive assumptions about the search space (assumptions concerning continuity, existence of derivatives, unimodality, and other matters)" [David E Goldberg, (1989)].

The phenomena of *implicit parallelism* comes
from the fact that throughout the population of strings there are reacurring patterns (schema) that are each evaluated
independently but whose combined probability of propagation lead to
exponentially increasing samples of the observed best. One may envision a
hidden network connecting every set of equivalent schema such that each schema
has not only knowledge of its strength with respect to the other schemata on
its string, but also a probabilistic knowledge of its performance with any
other string whose configuration lies somewhere between it and its cousin
strings in the search space (see figure below). This implicit parallelism
allows on order of n^{3} schema evaluations even though there are only n string
evaluations [David E Goldberg,
(1989)].

*Figure 4.** Implicit Parallelism in a GA population.*

One characteristic of GAs that they share with other combinatiorial search methods is the potential to become trapped in local minima. As usuall, a Monte-Carlo (simulated annealing) method is used to enable the GA to escape. This is generally done by introducing new randomly generated strings or by increasing the probability of mutations. Also, a reacurring question is how to best vary the proportion of exploration vs. exploitation. Exploitation in a GA is best acheived by having the very highest rated strings reproduce.

**Reducing Network Latency: GAs,
QAP and the GMP.**

A logical way to reduce network latency between two communicating processes
is simply to move them closer together. Thus the
first project referenced "Parallel Distributed Genetic Algorithms for the
Quadratic Graph Matching Problem"**, **[Scott (1990)]
involves the traditional problem of load balancing in an amorphous distributed
processing environment, i.e., a graph matching problem. As this problem is
transformable to the Job-Shop Scheduling Problem, it was logical to use the
aforementioned quadratic function of the mapping of jobs to shops. Since this
problem typically has many sub-optimal solutions, a simulated-annealing
algorithm was naturally employed. And, naturally again, a Genetic-Algorithm
search methodology provides a simple, robust and easiliy
distributed means of competetive hill climbing. The
other classical gradient scaling functions do not have the flexibility of
information sharing between permutations of solutions. This system was
completed using the Parallel C++ language of [Dennis Gannon (1990)]. The code was
limited to a symmetric Alliant FX8 64 node machine and thus the graphs tested
were not tied to any real-time dynamic system. It is expected that it will
easily port to Java and thus execute on a truly heterogenous/amorphous
dynamic network topology. Also, it is expected that intelligent
"shepherd" agents will execute this graph mapping function on small
(local) partitions of a distributed environment in the forthcoming development.

*Figure 5.** PN->PE QAP and Results*

The intent of the devised GA Q-GMP (Quadratic Graph Mapping Problem) system was to address the problem of process migration in parallel distributed systems in order to maximize data throughput. But, several of the problems in such a scheme include (from [Craige Partridge, (1993)]):

- a) How to migrate the control of a process,
- b) How to migrate the binary information, instructions and data associated with the process,
- c) How to support heterogeneity and
- d) When and where to migrate.

Here we are not concerned with the details of the implementation (a-c), but rather just the theoretical component of combinatorial optimization (d).

First we define the graph of a distributed
software system and a similar distributed parallel processing hardware system.
Each processor element (*pe*) has a capacity *k*
which we will illustrate as a relative size of the node's circle. Likewise,
each pair of processors have a certain bandwidth (*bw*) between them which we will display as weighted
undirected lines. The software system consists of software elements (*se*)
and data weights (*dw*).
Call the hardware graph *H*, and the software *S*. Then

*H = {V _{pe},E_{bw}} and S = {V_{se},E_{dw}}*.

The representation of the GA string is just a
list, each element representing the *se* that will be assigned to the pe of the same index.

*GA1 = (pe4 pe3 pe1 pe2 pe5)*

Note that if two *se (se _{i},se_{j})* that have an

Note that the QAP for load balancing is also the basis of Flow-Based Routing: a static algorithm that accounts for both topology and load. In this method a traffic matrix Fij, a line capacity matrix Cij and a topology matrix Tij are used. It determines mean packet delay from queueing theory. Its drawbacks are that it needs a piori knowledge of the subnet topology. Also the algorithm is not dynamic and requires an offline calculation due to O(n^2) complexity. But, it will be shown below that intelligent agents can execute this algorithm on small local topologies inducing global emergent optimization.

**RESULTS (EXAMPLE RUN)**

This is the max fit match on a particular node of the 64 running. Since the best fit migrate, the eventual solution can not be worse than this.

Note the spike drop near the end run. As the system converges on a solution, bold mutations are tested for possible 'MDP' hops to better solution spaces.

**References:**

Emile Aarts, Jan Korst: Simulated Annealing and Boltzmann Machines

Christos H. Papadimitriou, Kenneth Steiglitz: Combinatorial Optimization, algorithms and complexity.

James A McHugh: Algorithmic Graph Theory.

Gregory Rawlins: Foundations of Genetic Algorithms.

G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, D. Walker: Solving Problems on Concurrent Processors.

Peter D Simic': Constrained Nets for Graph Matching and Other Quadratic Assignment Problems. Neural Computation, Volume 3, Number 2, pp 268-271.

Heinz Mhlenbein: Evolution in Time and Space - The Parallel Genetic Algorithm. -Distributed mating. Local hill climbing. The Parallel Genetic Algorithm as a Function Optimizer. -Superlinear Speedup, late local hill climbing, subpopulations

Weber, Depew, Smith: Entropy, Information, and Evolution.

Josef Hofbauer, Karl Sigmund: The Theory of Evolution and Dynamical Systems.

David E Goldberg: Genetic Algorithms.

Richard Dawkins: The Selfish Gene. The Blind Watchmaker.

David E Goldberg, Clayton L Bridges: An Analysis of a Reordering Operator on a GA-Hard Problem.

Dale Schuurmans, Jonathan Schaeffer: Representational Difficulties with Classifier Systems.

Sugato Bagchi, Serdar Uckun, Yutaka Miyabe, Kazuhiko Kawamura: Exploring Problem Specific Operators for Job Shop Scheduling

Gregor Von Laszewski: Intelligent Structural Operators for the K-way Graph Partitioning Problem.

D. Whitley, K. Mathias, P Fitshorn: Delta Coding: An Iterative Search Strategy for Genetic Algorithms.

Robert J. Collins, David R. Jefferson: Selection in Massively Parallel
Genetic Algorithms. (* graph** partitioning,
local beat panmictic reproduction*)

Jay N. Bhuyan: Genetic Algorithm for Clustering with an Ordered Representation.

Donald R. Jones and Mark A. Beltramo: Solving Partioning Problems with Genetic Algorithms.

Nashat Mansour, Geoffrey C. Fox: ***A Hybrid Genetic Algorithm for Task Allocation in Multicomputers.

Fri Oct 17 16:26:52 MST 1997

*-----------------------------------------------------------------------------------------------------------------------------------*