Musical ArTbitrariness
Information Technology Institute, Campinas, SP, Brazil
e-mail: artemis@ia.cti.br
Department of Computer Engineering and
Industrial Automation
School of Electrical and Computer Engineering
State University of Campinas, SP, Brazil
Interdisciplinary Nucleus for Sound Studies
State University of Campinas, SP, Brazil
Abstract
Evolution is now considered not only powerful enough to bring about the biological entities as complex as humans and conciousness, but also useful in simulation to create algorithms and structures of higher levels of complexity than could easily be built by design. In what follows, we will present an attempt to perform interactive iterative optimisation using evolutionary computation and other computational intelligence methodologies. When artistic aspects are involved, every attempt to improve aesthetic judgment will be denoted here ArTbitrariness. Our emphasis will be in an approach to interactive music composition.
When the human judgment replaces a formal fitness criterion, we have a fitness surface, but this surface cannot be expressed in mathematical terms. For musical affairs, phase spaces are suggested as one more feature of visual domain that may be used as fitness function to explore the musical domain.
Complexity may be defined as the situation in which, given the properties of the parts and the laws of their interaction, it is not an easy matter to infer the properties of the whole. Also, the complexity of a system may be described not only in terms of the number of interacting parts, but in terms of their differences in structure and function. But there are special problems when examining systems of high complexity; a system may have so many differents aspects that a complete description of it is quite impossible and a prediction of its productions is also impossible. The analysis of such a system would require the destruction of the system and thus preclude the completion of the analysis. But, on examining the estimated sequence of organisms, the generic conclusion is that the earliest forms were simple and small and that as evolution progressed there appeared increasingly complex entities.
There is an
enormous complexity in living systems, complexity which does not persist for
its own sake, but rather, is maintained as entities at one level are compounded
into new entities at a next higher level. Increased complexity allows for a
greater range of potential behaviors and thus provide a competitive advantage
[1]. Behavioral error, as measured in current
environmental context, is the sole quality sieved by competitive selection.
Behavioral error is measured by the costs and consequences of incorrectly
predicting forthcoming sequences of environmental symbols [2].
Evolution
is now considered not only powerful enough to bring about the biological
entities as complex as humans and conciousness, but also useful in simulation
to create algorithms and structures of higher levels of complexity than could
easily be built by design. Both biological and simulated evolutions involve the
basic concepts of a genotype and a phenotype, and the processes of expression,
selection and reproduction with variations. The genotype is the genetic encoded information for the creation of an
individual. The phenotype is the
individual itself, or the form that results from the developmental rules and
the genotypes. Expression is the
process by which a phenotype is generated from a genotype. Selection depends
on the process by which the fitness of phenotypes are determined. Fitness is simply the ability of an
organism to survive and reproduce. Reproduction
is the process by which new genotypes are generated from an existing
genotype.
For the
evolution to progress there must be variation
or mutations in new genotypes with some frequency. Mutations are usually probabilistic, as
opposed to deterministic. Selection is,
in general, non random and is performed on phenotypes, while variation is
usually at random and is performed on the corresponding genotypes. The repeated
cycle of reproduction with variations and selections of the fittest individuals
drives the evolution of a population towards
higher and higher level of fitness. Sexual
combination allows genetic material
of more than one parent to be mixed together in some way to create new
genotypes. This permits the features to
evolve independently and later on be combined into an individual genotype. Although it is not necessary for the
evolution to occur, it is very appreciated to enhance progress in both
biological and simulated evolution. Representations for genotypes that are not
limited to fixed spaces and can grow in complexity have shown to be worthwhile.
The next section will talk about interactive genetic algoritms applied to visual domain and define ArTbitrariness. The emphasis will then be on an approach to interactive music composition denoted as Vox Populi, a hybrid made up of an instrument and a compositional environment, appropriate for the study of human perception in the musical domain.
In computer graphics,
procedural models are employed to create scenes and animations having high
degrees of complexity. Procedural models are used to describe objects that can
interact with external events to modify themselves. Thus, a model of a sphere
that generates a polygonal representation of the sphere at a requested fineness
of subdivision is procedural, the actual model is determined by the fineness
parameters. A model that determines the origin of its coordinate systems, by
requesting information from nearby entities, is also procedural. The price paid
for this complexity is that the user often loses the ability to maintain
sufficient control over the results. A collection of polygons specified by
their vertices is not a procedural
model [3].
Procedural
models can also have limitations because the details of the procedure must be
conceived, understood, and designed by humans. The techniques introduced by [4]
contributes towards the solutions to these problems by enabling the “evolution”
of procedural models using interactive “perceptual selection”. Evolutionary
mechanisms of variation and selection are used to “evolve” complex equations
used in procedural models for computer graphics and animation. An interactive
process between the user and the computer allows the user to guide evolving
equations by observing results and providing aesthetic information at each step
of the process. The computer automatically generates random mutations of
equations and combinations between equations to create new generations of results.
This repeated interaction between user and computer allows the user to search
hyperspaces of possible equations, without being required to design the
equations by hand or even understand them. Sims also successfully applied
genetic algorithms to generate autonomous three-dimensional virtual creatures
without requiring cumbersome user specifications, design efforts or knowledge
of algorithmic details for evolving virtual creatures that can crawl, walk, or
even run [5].
In an interactive genetic algorithm (IGA), human judgment is used to
provide fitness, in an interactive training cycle. This cycle typically begins
with the presentation of the individuals in the current population for the
human mentor to experience them. In visual domains, where each individual
typically decodes to an image, all the individuals are usually presented at
once, often in reduced size so that the entire population can be viewed at
once. The mentor can compare and contrast the images concurrently and determine
the fitness of each individual in the context of all the others.
Identifying the criteria we use in our evaluations is hard enough. Justifying, or even (causally) explaining, our reliance on those criteria is more difficult still. When evolutionary computation and other computational intelligence methodologies are involved, every attempt to improve aesthetic judgement we denote as ArTbitrariness, and is interpreted as an interactive iterative optimization process. ArTbitrariness is also suggested as an effective way to produce art through an efficient manipulation of information and a proper use of computational creativity to increase the complexity of the results without neglecting the aesthetic aspects [6]. Our emphasis will be in an approach to interactive music composition.
In the musical domain, the temporal evolution of musical events prevents
the compressed, parallel presentation of individuals as in computer graphics.
Most of the applications of GA to music
found in literature presents population as an evolving trajectory of music
material such as chords, motives and phrases represented as events. The net
result for music, then, is that each individual in a population must be
presented individually and in real time. This leads to a severe fitness
bottleneck, which often limits the population size and the number of
generations that realistically can be bred in a musical IGA. These limits are
necessary not only to cut down on the length of time it takes to train a
musical IGA, but also to help reduce the unreliability of human mentors as they
attempt to sort through the individuals in a population, listening to only one
sample at a time.
The human mentor
runs into a fitness bottleneck in a musical IGA, because the mentor's task is
especially challenging. The ideal mentor would be able to reliably rank the
individual members of each population according to their musical merit;
however, this is clearly an unaccomplishable goal, given the size of the
populations and the inability of mentors to compare individuals easily. Another
issue is that individuals can be experienced realistically only in a harmonic
context, since the melodic templates
only become instantiated to actual notes when played over the chords of a
specific tune. Another training issue arises from the tendency of the GA
machinery to converge when one highly fit individual emerges early and
dominates a population. The set of musically meaningful mutation operators
includes mutations that thin out overused measures and reintroduce under-used
measures in the phrase population in an effort to promote diversity.
Nevertheless, mentors often become
tired of an overused lick and start punishing individuals in later generations
that had been rewarded heavily in earlier generations.
A subclass of the field of algorithmic composition includes those
applications which use the computer as a "cross" between an
instrument, in which a user "plays" through the application's
interface, and as a compositional aid, which a user experiments with in order
to generate stimulating and varying musical material. Much of the work that has
been done in this field has been based on the idea of determining a set of
rules (constraints) which guide the generation of material, rules which are
either coded explicitly, or are "learned" by the system through its
interaction with the user. Horowitz’s
development falls into this latter category, given a set of constraining
assumptions from which a large number of rhythms can be generated. The system
uses an interactive genetic algorithm to learn the user's criteria for
distinguishing amongst rhythms. As the system learns (develops an increasingly
accurate model of the function which represents the user's choices), the quality
of the rhythms it produces improves to suit the user's taste. Interactive
genetic algorithms are well suited to solving this problem because they allow
to a user to simply execute a fitness function (that is, to choose which
rhythms he likes), without necessarily understanding the details or parameters
of this function, all that a user needs to be able to do is to evaluate the
rhythms.
Vox Populi [7] is a hybrid made up of an
instrument and a compositional environment. The population is made up of groups
of four notes, and they are potential solutions for a selection ordering of
consonance of musical intervals, the notion of approximating a sequence of
notes to its harmonically compatible note or tonal center is used. The resultant music moves from very
pointillistic sounds to sustained chords; it depends upon the duration of the
genetic cycle and the number of individuals of the original population.
Tonal centres can be thought of as an approximation of the melody describing its flow. This method employs fuzzy formalism and is posed as an optimisation approach based on the factors relevant to hearing music, technically detailed in [7], available at (http://www.ia.cti.br/~artemis/voxpopuli). In the selection process, the group of voices with the highest musical fitness is selected and played. The musical fitness of each chord is a conjunction of three partial fitness functions: melody, harmony and voice range.
Musical Fitness = Melodic
Fitness +
Harmonic Fitness +
Voice
Range Fitness
Differently from other systems found in genetic
algorithms or evolutionary computation, in which people have to listen to and
judge the musical items, Vox
Populi uses the computer and the mouse as real-time music controllers, acting
as a new interactive computer-based musical instrument. The interface is
designed to be flexible for the user to modify the music being generated. It
explores Evolutionary Computation in the context of Algorithmic Composition and
provides a graphical interface that
allows to modify the tonal center and the voice range, changing the
evolution of the music by using the mouse.
The interactive pad control supplies a graphical area in which bi-dimensional curves can be drawn. These curves, a blue and a red one, links to them other controls of the interface, as depicted in Fig. 1. The curves are traversed in the order they were created, their horizontal and vertical components are used for fitness evaluation and to modify the duration of the genetic cycles, interfering directly in the rhythm of the composition. Each curve describes a phase space between the linked variables.
Fig.
1 – A simple draw and it’s correspondent musical output
Fig.
2 – A more complex draw and it’s correspondent musical output
The graphical pad allows the composer to conduct
the music through drawings, suggesting an elementary conductor gesture. By
different drawings, the composer can experience the generated music and conduct
it, trying different trajectories or sound orbits. The pad is vertically
divided in three regions. Drawings at left generate faster musical sequences;
at right, slower musical sequences.
The graphical pad may be considered as one more feature to the mentor for evaluating the music. The composer does not need to “hear” the music to guess how it occurs, he is able of “mentally” hearing by just reading a partitur. Even an interpreter creates a mental schema to memorise the music for a performance. In Vox Populi, pad drawings can be associated to musical sequences allowing the mentor to use them as another evaluating feature.
The idea of producing two-dimensional pictures from a one-dimensional sound source is not new. Pronovost and colleagues describe a real-time process using analog circuits which produces two-dimensional images [8]. Pickover describes a different technique for obtaining two-dimensional images [9]. These works were aimed at helping deaf people to speak. Pellegrino describes a number of techniques for mapping sound into visual images [10]. More recently, Monro and Pressing examined the adaptation of a standard technique of mathematical analysis for the representation of sound [11]. This technique is commonly referred to as embedding, or the method of delays, or the pseudo-phase space method, and shows that this method provides a compact way to represent multi-dimensional correlations of a musical signal; it also readily produces intricate color plots and movies that, in their view, are striking displays of sonic visual art.
To further remove the necessity of human
interaction in the algorithmic composition process, the criticism used in
evolving artificial composers can be trained using easy-to-collect musical
examples, rather than constructed using dificult-to-determine musical rules.
Baluja, Pomerleau and Jochem [12], working in the visual domain, have trained a
neural network to replace the human critic in an interactive image evolution
system similar to that created by Sims. The network “watches” the choices which
a human user makes when selecting two dimensional images from one generation to
reproduce in the next generation, and over time learns to make the same kind of
aesthetic evaluations as those made by a human user. Since Vox Populi is a
hybrid made up of an instrument and a compositional environment, the network
can accurately follow human choices during the process of composition.
Psychologically,
one of the great powers of computer programming
is the ability to define new compound operations in terms of old ones, and to
do this over and over again, thus building up a vast repertoire of ever more
complex operations. It is quite reminiscent of evolution, in which ever more
complex molecules evolve out of less complex ones, in an ever-upward spiral of
complexity and creativity. It is also quite reminiscent of the industrial
revolution, in which people used very simple early machines to help them build
more complex machines, then used those in turn to build even more complex
machines, and so on, once again in an ever upward spiral of complexity and
creativity. At each stage, whether in
evolution or revolution, the products get more flexible and more intricate,
more “intelligent” and yet, more vulnerable to delicate “bugs” or breakdowns
[13].
Evolution is a method for creating and exploring complexity that does not require human understanding of the specific process involved . This process of interactive evolution could be considered a system for helping the user with creative explorations, or it might be considered a system which attempts to “learn” about human aesthetics from the user. In either case, it allows the user and computer to work together interactively in a new way to produce results that neither could easily produce alone.
Vox Populi interface, designed to be flexible for the user to modify the music being generated, might be appropriate to study human musical perception. The choices interactively made by a composer in response to the evolving music can be stored as a parametric control file and recorded as a musical signal as well. The obtained data can be applied to training neural networks, that in turn may be used as fitness functions, imposing a personal style.
The mappings are easy to produce and give detailed pictures of musical sequences. This information could be used to model a sequence or a cognitive structure underlying a musical design. Finally, the use of IGA in compositional systems is powerful to control the complexity of the music material in the flow. IGA associated to other interactive strategies, such as graphic the environment used in Vox Populi, could be a strategy to avoid the musical fitness bottleneck. High dimensional curves would lead the musician to have overall view of the musical evolution.
The relevance of this approach goes beyond music applications per se. Computer music systems that are built on the basis of a solid theory can be coherently embedded into multimedia environments. The richness and specialty of the music domain are likely to initiate new thinking and ideas, which will have an impact on areas such as knowledge representation and planning, and on the design of visual formalisms and human-computer interfaces in general.
Part of this
project is possible through the support of FAPESP to the Gesture Interface
Laboratory in which VoxPopuli was developed. Fernando J. Von Zuben is supported by the CNPq
grant 300910/96-7. Artemis Moroni is supported by ITI.
[1] B. Fogel. Evolutionary Computation - Toward a New Philosophy of Machine Intelligence, USA: IEEE Press, USA, 46 – 47, 1995.
[2] W. Atmar. Notes on the Simulation of Evolution, IEEE Transactions on Neural Networks Vol. 5, No. 1, 130 –147, 1994.
[3] James D. Foley, Andries van Dam, Steven K. Feirner and John F. Hughes. Computer Graphics Principles and Practice, Addison-Wesley Publishing Company, p. 1018, 1996.
[4] K. Sims. Interactive Evolution of Equations for Procedural Models, The Visual Computer Vol. 9, No. 9, 466 -- 476, 1993.
[5] K. Sims. Evolving
Three-Dimensional Morphology and Behaviour. Evolutionary Design by Computers,
ed. Peter J.
Bentley, Morgan Kaufmann, 297 –
321,1999.
[6] A. Moroni, J. Manzolli, F. Von Zuben, R. Gudwin. Evolutionary Computation applied to Algorithmic Composition, Proceedings of CEC99 - IEEE International Conference on Evolutionary Computation, Washington D. C., p. 807 – 810, 1999.
[7] J. Manzolli, A. Moroni, F. Von Zuben, R. Gudwin (1999). An Evolutionary Approach Applied to Algorithmic Composition, Proceedings of SBC’99 - XIX National Congress of the Computation Brazilian Society, Rio de Janeiro, Vol. 3, 201 -- 210 (1999).
[8] W. Pronovost, L. Yenkin, D. Anderson and R. Lerner. The Voice Visualizer, Ammerican Annals of the Deaf, 113: 230 – 238, 1968.
[9] C. Pickover, On the Use of Symmetrized Dot Patterns for the Visual Characterization of Speech Waveforms and Other Sampled Data, Journal of the Acoustical Society of America 80(3): 955 – 960, 1986.
[10] R. Pellegrino, The Electronic Arts of Sound and Light, New York: Van Nostrand, 1993
[11] G. Monro and J. Pressing. Sound Visualization using Embedding: The Art and Science of Auditory Autocorrelation, Computer Musical Journal, 22:2, 20 – 34, 1998
[12] S. Baluja, D. Pomerleau
and T. Jochem (1994). Towards automated artificial evolution for
computer-generated images, Connection
Science, 6(2-3), 325 -- 354.
[13] D. R. Hofstadter (1985). Methamagical Themas, USA: Basic Books, p. 693.