Generative
flowers and flower canopies as visualizations of binary information and of DNA
sequences
Lab for Applied
Epistemology, University of Ghent
philip.vanloocke@Ugent.be
Abstract
A set of three-dimensional
branching structures is defined. The endpoints of trees are associated with
p-bit strings. Coefficients associated with p-bit strings determine the local
curvature of the branches. Long bit-strings are analyzed in terms of shorter
p-bit strings, and the coefficients thus obtained map long bit-strings on a
visualization in terms of a tree-structure. The tree-visualization is turned
into a flower-visualization by definition of a non-linear tree-based
projection. As an alternative, the flower-visualization is attached multiple
times at the endpoints of the tree. The distribution of coefficients of p-bit
strings determine the spread of the flower canopy, the size of flowers, and the
open/closed-attribute. Interpolation of bit-strings yields interpolation of the
corresponding flower-visualizations.
Keywords: visualization, bit-string, non-linear projection,
generative flower, Mandelbrot set
1. Introduction
The beauty of plants and
trees has inspired much work in computer graphics [1]. Fractal approaches to
generative tree construction have been applied to data-visualization by different
authors [2-6]. Typically, such approaches analyze data in a hierarchical way,
so that they can be mapped on a bifurcating tree. The components of the data
which are most important according to a formal analysis (such as an entropy
method, or principal component analysis) are mapped on the initial nodes of the
tree; less important components guide data through subsequent bifurcations
until the leaf-level is reached. This paper visualizes long strings of binary
and quaternary data. A string is analyzed by moving a p-bit window over it, and
by determining coefficients for each of 2p or 4p strings of length p. The short
strings are associated with the leafs of a tree with 2p or 4p end-nodes, and
the coefficients of these strings determine the curvature in the branches of
the tree.
The tree-representation that
is obtained depends on an initialization. An initialization is appropriate if
the spatial structure of branches gives visual information about the p-strings
attached. This can be realized if the initialization of the tree is a highly
structured and familiar form. The present paper uses the boundary of the
Mandelbrot set to initialize trees.
After a tree is bent by data, it is turned into a flower form. This is obtained in two steps. First, endpoints are interpolated by a contour. Second, this contour is projected in a series of steps so that a surface is obtained. The projections are defined by successive small variations of the parameters associated with the tree. The length of line segments in branches is decreased, so that a form is obtained which converges toward the starting point of the tree. Depending on the choice of parameters, more concrete or more abstract flower forms result. Finally, this method is turned into a generator of flower canopies by putting the flowers on top of bent tree representations. The spread of the flowers over the leafs of the tree, as well as their attributes ‘size’ and ‘open/closed’ are based on information derived from p-window analysis.
2. Spatial trees: initialization and bending on basis of p-bit analysis
of long binary strings
Two types of spatial trees are considered in the present
context: binary trees in which, at each bifurcation point, a branch splits in
two branches, and quaternary trees in which branches split in four new
branches. The initialization of a tree is specified by the endpoints of its
final branches, by the starting point S of the tree, and by a series of factors
g1,…,gp, which specify to which extent branches are drawn to S at each level. In
case of a binary tree, 2p endpoints Spj (j=1,…,2p) are given. In all examples
which follow, these points are chosen at constant line distance on an oriented
contour of the Mandelbrot-set. The endpoints are grouped in p/2 pairs of points
with consecutive indexes, and for each pair, the average point S*,p-1j
(j=1,…,2p-1) is determined. These points are contracted toward S by the factor
gp. The resulting points are the starting points of the final level branches of
the tree. This process is repeated until, at level one, two points S1j (j=1,2)
are obtained. The tree is completed by connecting each point S1j with SºS0.
For a quaternary tree, the same procedure is followed.
On the contour of the Mandelbrot-set, 4p points Spj (j=1,…,4p) are chosen at equal line distance. An initial point S
and p factors gi are defined (j=1,…,p). At level p-1, 4p-1 points are obtained by taking averages of
four consecutive points, and by contraction of the latter toward S by the
factor gp. This procedure is iterated until, at level one, 4 points S1j result.
These are connected with S in order to complete the tree. Figure 1 illustrates
these concepts. Figure 1a shows the contour on which the endpoints are defined.
In Figure 1b, a binary tree corresponding to p=8 is shown. Figure 1c
illustrates a quaternary tree for p=5.
Figure 1a-c. Contour of the
Mandelbrot-set with points selected for a binary tree and p=8 (Figure 1a);
Figure 1b shows a binary tree for p=8; in Figure 1c, a quaternary tree for p=5
and different values for gj is shown
In order to bend the tree, each branch is divided into M segments. Consider a sequence of p branches, starting at the first level and ending up at the p-level. Such a sequence is called a ‘long branch’; it consists of N=n.M segments. Consider the h-th segment along such a long branch, and suppose that it belongs to an s-th level branch (so that (s-1).M < h < s.n+1 and s=1,…,p). The length of the line segment is denoted lh .The unit vector eh along the segment is specified by two angles (qh, jh), where qh is the angle between the x-axis and the projection of eh on the horizontal plane, and jh is the angle between the z-axis and eh. The difference between qh+1 and qh, respectively jh+1 and jh, is denoted dqh, respectively djh: qh+1 = qh + dqh and jh+1 = jh + djh. A tree will be bent by specification of modifications for the differences dqh and djh.
In order to
obtain trees which keep symmetry, the bending process is made a function of the
order of branches in the set of two or of four branches to which they belong
(where these sets are defined by the fact that branches of the same set have
the same starting point). In case of a binary tree, values +1 and –1 are
associated with the first, respectively the second branch of a set. In case of
quaternary trees, values +2, +1, –1 and –2 are associated with the branches.
Suppose that the value associated with the s-th branch in the long branch is
denoted as. Then, a simple rule for bending the tree is obtained as
follows:
dqh= dqh + as. e1 and djh = djh + e2
where e1 and e2 are constants
This
rule is complemented with a second one, which inserts additional angular
changes at each first segment of a branch. These changes are periodic functions
of the branching level s, and also linearly increase as a function of s, in
order to prevent the tree from curling into itself too strongly. In case of a
binary tree, the second rule reads:
dqh = dqh + as g1 .s. cos(a1+b1.s)2
If a1= +1 and as = +1 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1= -1 and as = -1 then djh = djh + g2 .s. cos(a2+b2.s)2
where h = 1 + (s-1).n (s=1,…,p),
and a1, b1, a2 , b2 , g1 , and g2 are constants
For combinations of a1 and as not mentioned in the rules for jh, no change of the difference is made. In case of a quaternary tree, the rule following rule was used:
dqh = dqh + has g1 .s. cos(a1+b1.s)2
If a1= 2 and as = -2 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1= 2 and as = -1 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1= 1 and as = -2 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1= 1 and as = -1 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1 = -1 and as = 1 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1 = -1 and as = 1 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1 = -2 and as = 2 then djh = djh + g2 .s. cos(a2+b2.s)2
If a1 = -2 and as = 2 then djh = djh + g2 .s. cos(a2+b2.s)2
where has= 1 if as= +2 or as= +1 and has= -1 if as= -1 or as= -2
The
rule for quaternary trees is formulated in terms of has instead of as
in order to avoid wide horizontal divergence between branches. The second rule
is not applied to the branches connecting S with the first level nodes.
The parameters in the tree-bending rules are made a function of the statistics of sets of combinations of p letters in a binary or quaternary information sequence. Suppose that a quaternary string is considered. For each combination of p letters, the number of times the combination occurs in subsequent places of the string is determined. This defines 4p coefficients, which are associated with the 4p p-th level branches of the tree. Suppose that such a coefficient is denoted wpj (j=1,…,4p). In order to associate coefficients with lower level-branches, the coefficients wpj are propagated backward through the tree. For each lower level-branch at level s (s=1,…,p-1), a coefficient wsj is defined which is the average of the coefficients of the four branches sprouting from it (j=1,…,4s). At each level of the tree, normalized coefficients are determined. Normalization occurs by drawing coefficients at the same level between zero and one by subtraction of the smallest value at the level, and by division by the difference between the largest and the smallest value. Normalized coefficients are denoted nwsj. In order to visually represent the statistics of p-combinations of letters, the quantities nwsj are inserted in the tree-bending process. This is realized by making the parameters e1, e2, g1 and g2 proportional to a power r of one minus the normalized coefficients nwsj (a1, a2, b1 and b2 are kept constant throughout the whole tree). In case of binary trees, the similar procedure is followed.
The quantities nwsj are generally not symmetric, so that the initial symmetry of the tree is not conserved. The aesthetics of bent tree-forms (and of the flowers enveloping them in the next section) is often significantly increased if mirror symmetry relative to a vertical plane is maintained. In order to obtain this, the coefficients of the s-th and of the 1+4p-s -th combination of letters are replaced by their average value (s=1,…,4p-1). This choice divides the number of statistical quantities represented in the tree by 2. This concepts are illustrated for six DNA codes which can be found on GENBANK (JCvirus, pappilomavirus, human herpesvirus 5, simian adenovirus 3, human adenovirus C, and human adenovirus B). Figure 2a-2f were generated for quaternary trees with p=8 and r=2. Figure 3a-f correspond to binary trees with p=8 and r=1. The Figures were obtained for the initialization shown in Figure 1c.
Figure 2a-f. Non-linear quaternary trees for p=5 and for JCvirus, human
pappilomavirus, human herpesvirus 5, simian adenovirus 3, human adenovirus C,
and human adenovirus B
3. Flowers visualizing binary and
quaternary strings by non-linear tree-based projection of contours
The contour by which the endpoints of trees are initialized can be reconstructed after the bending process. Endpoints with successive indexes can be connected by a line, or can be interpolated in a non-linear, curved way. The latter option will be followed in the present paper.
Consider the line interval
between end-points with indexes k and k+1. The length of the interval is
notated d. Suppose that, for a given number q, G is the average point of the
endpoints with index between k-q and k+1+q (if an index
in this set is smaller than 1, 2p is added to it; if it is larger
than 2p, then 2p is subtracted from it). For each point P
on the line interval, the unit vector originating in G and pointing to P is
determined. Then, if d* is the distance between P and the k-th endpoint, P is
displaced in the direction of the unit vector by an amount proportional to
(0.5)2 - ((d*/d) - 0.5)2.
Once the tree is
constructed, the initial contour can be complemented with a set of t-1 additional contours which connect endpoints of
trees generated for different parameters. If the space between contours is
filled, an object results. Suppose that, among the variations in parameters,
the length of line segments decreases stepwise to zero. Then, the initial
contour shrinks and is drawn to the origin, so that, when contours are
interpolated, a structure results which is closed at the origin and which opens
up toward the initial contour. This procedure can be used to generate concrete
or abstract flower-like structures on basis of an initial contour. In the
illustrations which follow, the length of line segments is contracted linearly
when subsequent trees are drawn. When the sequence of contours is reversed,
contours grow upward from the origin by increase of the length l
of
line
segments according to l = lini.(q-1)/(t-1) (q =1,…,t), where lini is the length of
the segment in the tree corresponding to the initial contour. This process is
turned into a non-linear projection method by a) specifying that the vertical
angles j of all line segments are replaced by angles
c1.j+c2.z, with c1=((s-1)/(t-1))r, c2=1-c1, and where
z is a fixed angle, and b) by linearly
decreasing the impact of the data on the bending process when generating trees
shrink toward the origin.
This
process is illustrated for generative, binary strings in Figure 3e-f. The
Figures were generated for bending parameters of weak, intermediate, an higher
value. In Figure 4a-b, two DNA sequences were recoded as binary strings first,
after which the corresponding flower structure was generated. In Figure 5a-b
and 6a-b, two DNA sequences were visualized by quaternary trees and for
different values of z.
Figure 3a-b. Two flower representations of
bit strings for low values of bending parameters
Figure 3c-d. Four flower visualizations of bit-strings
for intermediate values of bending parameters
Figure 3e-f. Four visualizations of bit-strings for high values of
bending parameters
Figure 4a-b. Flower structures corresponding to binary
recoded human adenovirus C and human adenovirus B
Figure 5 a-b. Flower
structures corresponding to human adenovirus C and human adenovirus B
Figure 6a-f. Flower structures corresponding to human
adenovirus C and human adenovirus B, and for different value of z
4. Flower canopies as
visualizations of information strings
An
object generated by the method of the previous section can be ‘fractalized’ on
basis of intrinsic parameterization. On each contour in the set of non-linear
projections of the initial contour, 2p endpoints of a tree are
marked. On the q1-th contour (0 < q1 £ t), r1 endpoints can be selected at which
a new object is attached (0 < r1 £ 2p). In special, the object itself can be
attached in a scaled version at such a point. In the present paper, the object
is rotated over an angle which is equal to the angle between the z-axis and the
final line segment pointing to the endpoint. This process can be iterated. On
the scaled object, the q2-th contour can be selected (0 < q2
£ t), and a scaled
rotated version of the original object can be attached at r2
selected endpoints (0 < r2 £ 2p), and so on. Figure 7 gives an
illustration for an initial contour corresponding to a non-bent tree. The
figure shows the initial object along with two stages of fractalization.
Figure 7. Form generated for non-bent
tree on initial contour and with two levels of fractalization
When bit-strings are visualized, the visual accessibility of a structure is enhanced if the initial object is not drawn, and if objects at the first stage of fractalization are attached at the top-contour of the initial object. As a result, a branching stem is provided with flower-structures at its endpoints. The density of the canopy can be made a function of properties of the distribution of coefficients in p-bit analysis. Suppose that q1 sites are determined in a symmetric way so that the i-th site is symmetric relative to site 1+2.q1-i (i=1,…,q1). For the q1/2 first sites, the function b(i)= 1-Abs(q1/4-i) / (q1/4) is calculated (Abs denotes the absolute value function; b(i) increases linearly from 0 to 1 between 0 and q1/4; it decreases from 1 to zero between q1/4 and q1/2); for the q1/2 last sites, the function b(i) is given by b(i)=1-Abs(q1/4-(i-q1/2)) / (q1/4). Two parameters associated with the distribution are calculated. First, the highest coefficient cmax of p-bit strings is determined. Second, the number of coefficients which are larger than x.cmax is calculated (in the illustrations which follow, x=0.5). The square root of this number is denoted s. Then, at each of the sites i (i=1,…,q1), a rotated version of the initial flower-structure is attached with a scale proportional to cmax.b(i)(r1-r2.s). Since b(i) is smaller than one, a low value of s entails that a steep distribution of flowers in the canopy results and vice versa. The numbers r1 and r2 in the exponent of b(i) have to be chosen in such a way that, for the largest value of s in the set of strings which are visualized, r1-r2.s is always positive.
Along with size, the attribute
‘open/closed’ is associated with each flower. In the construction process of
the flower, the quantity b(i)(r1-r2.s) is used to modulate vertical angles
of the line segments of all contour-projecting trees. More specific, the angles
of all segments in the s-th tree are multiplied with a factor f given by f = (c1+c2
(1-
(s-1)/(t-1))),
where c1= b(i)(r1-r2.s) and c2=1-c1.
This factor leaves angles at the bottom of a flower unaffected, and reduces
angles to c1 at the top of a flower. In order to prevent this
process from being accompanied with the appearance of long peeks in the center
of the flower structures, the length of all line segments can be scaled with a
factor (e1-e2 (s-1)/(t-1)),
where e1 and e2 are constants and e2=1-e1.
In the illustrations which follow, e1=0.75 and e2=0.25.
This way, the larger flowers in the Figures open up, whereas smaller ones
remain closed. Figures 8-9 illustrate flowers generated for two different
bit-strings, with s=3 and s=7, respectively.
Figure 8. Flower visualization of bit-string with s=3
Figure 9. Flower visualization of bit-string with s=7
5. Discussion
The use of a non-circular initial contour for the trees was motivated by the fact that a contour with structure allows one to more easily visually locate a leaf corresponding with a particular p-coefficient. The association between p-coefficients and leafs was weakened by the symmetry operation to which all trees were made subject. Removal of the symmetry constraint gives trees which are more informative, but with degrading aesthetics. Since aesthetics eases visual assessment of complex representations, it was adopted nevertheless.
It
can be noticed that bit-strings can be interpolated in a stepwise way. A first
bit-string which differs in n bits from a second one can be turned into the
latter by n successive bit clips. In the corresponding visualization, this
process entails that the flower structure associated with the former gradually
changes into the flower structure associated with the latter. Consider a very
long bit string of length N1, for instance a string corresponding to
a long piece of music. A long window of length N2 (with N2<N1)
can be moved stepwise over the string with length N1. If the string
in the N2-window is subject to p-bit analysis and visualized by the
present method, then moving the N2-window corresponds with movement
in the flower visualization. In the visualization of section 4, this implies
that the tree structure is transformed continuously, that the flower canopy
widens and shrinks, and that flowers open and close as long as the window is
moved. At this moment, this process is still too computation intensive to
obtain real time visualization of music on ordinary hardware, but it offers
another outlook for further continuation of this research.
[1]
P. Prusinkiewicz, A. Lindenmayer, The algorithmic beauty of plants, New
York: Springer
[2]
E. Kleiberg, H. van de Wetering, J. van Wijk (2001), Botanical visualization of
huge hierarchies, in Andrews, K., Roth, S., Wong P. (eds.) Proceedings of
IEEE symposium on information visualization, 87-94
[3]
B. Shneiderman (1992), Tree visualization with tree-maps: a 2-D space-filling
approach, ACM Transactions on graphics, 11(1): 92-99
[4]
H. Koike, H. Yoshihara, Fractal approaches for visualizing huge hierarchies, in
Proceedings of the IEEE 1993 symposium on visual languages, 55-60
[5]
J. Carriere, R. Kazman (1995), Interacting with huge hierarchies: beyond cones
trees, In: Proceedings of the IEEE 1995 symposium on information
visualization, 74-81
[6]
P. Van Loocke (2004), Data visualization with fractal growth, Fractals, 12(1),
123-136