Global Placement
In cgra_pnr
, there are several different global placers been implemented,
e.g., slice-based global placer, simulated annealing (SA) based placer, and
analytical placer. The current master
, SHA 9032d87
, uses a combination of
analytical and SA placer. Please notice that the placable unit in the global
placer is a cluster, which is equivalent to the computational kernel. There
are several ways to compute the kernel and you'll find different
implementations in different branch, as it's still work in progress.
Clustering
To reduce the placement time and improve the placement quality, clustering is
used in the global placement. cgra_pnr
uses a novel domain-specific
clustering algorithm based on word2vec
. The main idea is to use graph
embedding to obtain the embedding vector for each instance. Then use the
embedding vector to partition the instances, as opposed to using standard
heuristics to calculate netlist affinity.
Here are the steps to obtain the parition:
- Use Star expansion to expand the hypergraph into a normal graph.
- Use random walk to produce edge file.
- Run
word2vec
on the edge file and produce embedding. - Use
k-means
to cluster the instances.
Please note that the clustering algorithm is still a working-in-progress and will be changed in the near future. Future version will address better kernel discovery as well as efficient parallel training while preserving deterministic placement.
Slice Placer
The chip is divided into slices and the basic unit is a single slice. In the implementation the slice is called Macroblock, a term derived from video compression.
Slice-based placer is able to function very well even if the area utilization is high. However, it is much slower than the analytical placer.
Simulated Annealing Placer
It performs three possible movements when perform annealing:
- Move clusters up to
<dx, dy>
, wheredx
anddy
are random numbers with certain range. - Swap two clusters locations.
- Change the cluster's shape
Each movement has to be legal before committing the changes. Currently it takes a long to converge to a good solution (Python is very slow.)
Analytical Placer
This placer tries to minimize the cost function, which is a combination of HPWL, legal energy, and overlapping energy. See the figure below for better visualization.
The goal is to minimize the following cost function:
where is the force constant for inter-kernel cluster connections, is the distance between cluster and cluster , is the force constant for overlapping, and is the overlapped area between these two clusters. Notice that is not a constant as if two clusters have no overlapping area, the overlapping force is zero. Hence is not differentiable everywhere. One can approximate the overlapping force with , where is determined by the distance between two clusters.
Actually there is another entry in the cost function pertaining to legalization. Due to its complexity it is not covered here.