Musical ArTbitrariness


Sr Technologist Artemis Moroni, MSc

Information Technology Institute,  Campinas, SP, Brazil



Prof. F. J. Von Zuben, PhD

Department of Computer Engineering and Industrial Automation

School of Electrical and Computer Engineering

State University of Campinas, SP, Brazil


Dr. J. Manzolli, MSc, PhD

Interdisciplinary Nucleus for Sound Studies

State University of Campinas, SP, Brazil



Evolution is now considered not only powerful enough to bring about the biological entities as complex as humans and conciousness, but also useful in simulation to create algorithms and structures of higher levels of complexity than could easily be built by design. In what follows, we will present an attempt to perform interactive iterative optimisation using evolutionary computation and other computational intelligence methodologies. When artistic aspects are involved, every attempt to improve aesthetic judgment will be denoted here ArTbitrariness. Our emphasis will be in an approach to interactive music composition.

When the human judgment replaces a formal fitness criterion, we have a fitness surface, but this surface cannot be expressed in mathematical terms. For musical affairs, phase spaces are suggested as one more feature of visual domain that may be used as fitness function to explore the musical domain.

1.    Introduction

Complexity may be defined as the situation in which,  given the properties of the parts and the laws of  their interaction, it is not an easy matter to infer the properties of the whole. Also, the complexity of a system may be described not only in terms of  the number of  interacting parts, but in terms of their differences in structure and function. But there are special problems when examining systems of  high complexity; a system may have so many differents aspects that a complete description of it is quite impossible and   a prediction of its productions is also impossible. The analysis of such a system would require the destruction of the system and thus preclude the completion of the analysis. But, on examining the estimated sequence of organisms, the generic conclusion is that the earliest forms were simple and small and that as evolution  progressed there appeared increasingly complex entities.

There is an enormous complexity in living systems, complexity which does not persist for its own sake, but rather, is maintained as entities at one level are compounded into new entities at a next higher level. Increased complexity allows for a greater range of potential behaviors and thus provide a competitive advantage [1]. Behavioral error, as measured in current environmental context, is the sole quality sieved by competitive selection. Behavioral error is measured by the costs and consequences of incorrectly predicting forthcoming sequences of environmental symbols [2].

Evolution is now considered not only powerful enough to bring about the biological entities as complex as humans and conciousness, but also useful in simulation to create algorithms and structures of higher levels of complexity than could easily be built by design. Both biological and simulated evolutions involve the basic concepts of a genotype and a phenotype, and the processes of expression, selection and reproduction with variations. The genotype is the genetic encoded information for the creation of an individual. The phenotype is the individual itself, or the form that results from the developmental rules and the genotypes. Expression is the process by which a phenotype is generated from a genotype. Selection depends on the process by which the fitness of phenotypes are determined. Fitness is simply the ability of an organism to survive and reproduce. Reproduction is the process by which new genotypes are generated from an existing genotype. 

For the evolution to progress there must be variation or mutations in new genotypes with some frequency.  Mutations are usually probabilistic, as opposed to deterministic.  Selection is, in general, non random and is performed on phenotypes, while variation is usually at random and is performed on the corresponding genotypes. The repeated cycle of reproduction with variations and selections of the fittest individuals drives the evolution of a population towards  higher and higher level of fitness. Sexual combination  allows genetic material of more than one parent to be mixed together in some way to create new genotypes.  This permits the features to evolve independently and later on be combined into an individual genotype.  Although it is not necessary for the evolution to occur, it is very appreciated to enhance progress in both biological and simulated evolution. Representations for genotypes that are not limited to fixed spaces and can grow in complexity have shown to be worthwhile.

The next section will talk about interactive genetic algoritms applied to  visual domain and define ArTbitrariness. The emphasis will then be on an approach to interactive music composition denoted as Vox Populi, a hybrid made up of an instrument and a compositional environment, appropriate for the study of human perception  in the musical domain.

2.    Interactively Evolving Graphics

In computer graphics, procedural models are employed to create scenes and animations having high degrees of complexity. Procedural models are used to describe objects that can interact with external events to modify themselves. Thus, a model of a sphere that generates a polygonal representation of the sphere at a requested fineness of subdivision is procedural, the actual model is determined by the fineness parameters. A model that determines the origin of its coordinate systems, by requesting information from nearby entities, is also procedural. The price paid for this complexity is that the user often loses the ability to maintain sufficient control over the results. A collection of polygons specified by their vertices is not a procedural model [3].

Procedural models can also have limitations because the details of the procedure must be conceived, understood, and designed by humans. The techniques introduced by [4] contributes towards the solutions to these problems by enabling the “evolution” of procedural models using interactive “perceptual selection”. Evolutionary mechanisms of variation and selection are used to “evolve” complex equations used in procedural models for computer graphics and animation. An interactive process between the user and the computer allows the user to guide evolving equations by observing results and providing aesthetic information at each step of the process. The computer automatically generates random mutations of equations and combinations between equations to create new generations of results. This repeated interaction between user and computer allows the user to search hyperspaces of possible equations, without being required to design the equations by hand or even understand them. Sims also successfully applied genetic algorithms to generate autonomous three-dimensional virtual creatures without requiring cumbersome user specifications, design efforts or knowledge of algorithmic details for evolving virtual creatures that can crawl, walk, or even run [5].

In an interactive genetic algorithm (IGA), human judgment is used to provide fitness, in an interactive training cycle. This cycle typically begins with the presentation of the individuals in the current population for the human mentor to experience them. In visual domains, where each individual typically decodes to an image, all the individuals are usually presented at once, often in reduced size so that the entire population can be viewed at once. The mentor can compare and contrast the images concurrently and determine the fitness of each individual in the context of all the others.

Identifying the criteria we use in our evaluations is hard enough.  Justifying, or even (causally) explaining, our reliance on those criteria is more difficult still.  When evolutionary computation and other computational intelligence methodologies are involved, every attempt to improve aesthetic judgement we denote as ArTbitrariness, and is interpreted as an interactive iterative optimization process. ArTbitrariness is also suggested as an effective way to produce art through an efficient manipulation of information and a proper use of computational creativity to increase the complexity of the results without neglecting the aesthetic aspects [6]. Our emphasis will be in an approach to interactive music composition.

3.    Interactively Evolving Music

In the musical domain, the temporal evolution of musical events prevents the compressed, parallel presentation of individuals as in computer graphics. Most of the applications of GA  to music found in literature presents population as an evolving trajectory of music material such as chords, motives and phrases represented as events. The net result for music, then, is that each individual in a population must be presented individually and in real time. This leads to a severe fitness bottleneck, which often limits the population size and the number of generations that realistically can be bred in a musical IGA. These limits are necessary not only to cut down on the length of time it takes to train a musical IGA, but also to help reduce the unreliability of human mentors as they attempt to sort through the individuals in a population, listening to only one sample at a time.

The human mentor runs into a fitness bottleneck in a musical IGA, because the mentor's task is especially challenging. The ideal mentor would be able to reliably rank the individual members of each population according to their musical merit; however, this is clearly an unaccomplishable goal, given the size of the populations and the inability of mentors to compare individuals easily. Another issue is that individuals can be experienced realistically only in a harmonic context, since  the melodic templates only become instantiated to actual notes when played over the chords of a specific tune. Another training issue arises from the tendency of the GA machinery to converge when one highly fit individual emerges early and dominates a population. The set of musically meaningful mutation operators includes mutations that thin out overused measures and reintroduce under-used measures in the phrase population in an effort to promote diversity. Nevertheless,  mentors often become tired of an overused lick and start punishing individuals in later generations that had been rewarded heavily in earlier generations.

4.    Interactively Interfering in Music Evolution

A subclass of the field of algorithmic composition includes those applications which use the computer as a "cross" between an instrument, in which a user "plays" through the application's interface, and as a compositional aid, which a user experiments with in order to generate stimulating and varying musical material. Much of the work that has been done in this field has been based on the idea of determining a set of rules (constraints) which guide the generation of material, rules which are either coded explicitly, or are "learned" by the system through its interaction with the user. Horowitz’s  development falls into this latter category, given a set of constraining assumptions from which a large number of rhythms can be generated. The system uses an interactive genetic algorithm to learn the user's criteria for distinguishing amongst rhythms. As the system learns (develops an increasingly accurate model of the function which represents the user's choices), the quality of the rhythms it produces improves to suit the user's taste. Interactive genetic algorithms are well suited to solving this problem because they allow to a user to simply execute a fitness function (that is, to choose which rhythms he likes), without necessarily understanding the details or parameters of this function, all that a user needs to be able to do is to evaluate the rhythms.

Vox Populi [7] is a hybrid made up of an instrument and a compositional environment. The population is made up of groups of four notes, and they are potential solutions for a selection ordering of consonance of musical intervals, the notion of approximating a sequence of notes to its harmonically compatible note or tonal center is used. The resultant music moves from very pointillistic sounds to sustained chords; it depends upon the duration of the genetic cycle and the number of individuals of the original population.

Tonal centres can be thought of as an approximation of the melody describing its flow. This method employs fuzzy formalism and is posed as an optimisation approach based on the factors relevant to hearing music, technically detailed in [7], available at ( In the selection process, the group of voices with the highest musical fitness is selected and played. The musical fitness of each chord is a conjunction of three partial fitness functions: melody, harmony and voice range.

Musical Fitness =  Melodic Fitness + 

Harmonic Fitness +

                                                            Voice Range Fitness

Differently from other systems found in genetic algorithms or evolutionary computation, in which people have to listen to and judge the musical items, Vox Populi uses the computer and the mouse as real-time music controllers, acting as a new interactive computer-based musical instrument. The interface is designed to be flexible for the user to modify the music being generated. It explores Evolutionary Computation in the context of Algorithmic Composition and provides a graphical interface that  allows to modify the tonal center and the voice range, changing the evolution of the music by using the mouse.

The interactive pad control supplies a graphical area in which bi-dimensional curves can be drawn. These curves, a blue and a red one, links to them other controls of the interface, as depicted in Fig. 1. The curves are traversed in the order they were created,  their  horizontal and vertical components are used for fitness evaluation and to modify the duration of the genetic cycles, interfering directly in the rhythm of the composition. Each curve describes a phase space between the linked variables.



Fig. 1 – A simple draw and it’s correspondent musical output



Fig. 2 – A more complex draw and it’s correspondent musical output

The graphical pad allows the composer to conduct the music through drawings, suggesting an elementary conductor gesture. By different drawings, the composer can experience the generated music and conduct it, trying different trajectories or sound orbits. The pad is vertically divided in three regions. Drawings at left generate faster musical sequences; at right, slower musical sequences.

The graphical pad may be considered as one more feature to the mentor for evaluating the music. The composer does not need to “hear” the music to guess how it occurs, he is able of “mentally” hearing by just reading a partitur. Even an interpreter creates a mental schema to memorise the music for a performance. In Vox Populi, pad drawings  can be associated to musical sequences allowing the mentor to use them as another evaluating feature.

The idea of producing two-dimensional pictures from a one-dimensional sound source is not new. Pronovost and colleagues describe a real-time process using analog circuits which produces two-dimensional images [8]. Pickover describes a different technique for obtaining two-dimensional images [9]. These works were aimed at helping deaf people to speak.  Pellegrino describes a number of techniques for mapping sound into visual images [10]. More recently, Monro and Pressing examined the adaptation of a standard technique of mathematical analysis for the representation of sound [11]. This technique is commonly referred to as embedding, or the method of delays, or the pseudo-phase space method, and shows that this method provides a compact way to represent multi-dimensional correlations of a musical signal; it also readily produces intricate color plots and movies that, in their view, are striking displays of sonic visual art.

To further remove the necessity of human interaction in the algorithmic composition process, the criticism used in evolving artificial composers can be trained using easy-to-collect musical examples, rather than constructed using dificult-to-determine musical rules. Baluja, Pomerleau and Jochem [12], working in the visual domain, have trained a neural network to replace the human critic in an interactive image evolution system similar to that created by Sims. The network “watches” the choices which a human user makes when selecting two dimensional images from one generation to reproduce in the next generation, and over time learns to make the same kind of aesthetic evaluations as those made by a human user. Since Vox Populi is a hybrid made up of an instrument and a compositional environment, the network can accurately follow human choices during the process of composition.


Psychologically, one of  the great powers of computer programming is the ability to define new compound operations in terms of old ones, and to do this over and over again, thus building up a vast repertoire of ever more complex operations. It is quite reminiscent of evolution, in which ever more complex molecules evolve out of less complex ones, in an ever-upward spiral of complexity and creativity. It is also quite reminiscent of the industrial revolution, in which people used very simple early machines to help them build more complex machines, then used those in turn to build even more complex machines, and so on, once again in an ever upward spiral of complexity and creativity. At each stage, whether  in evolution or revolution, the products get more flexible and more intricate, more “intelligent” and yet, more vulnerable to delicate “bugs” or breakdowns [13].

Evolution is a method for creating and exploring complexity that does not require human understanding of the specific process involved . This process of interactive evolution could be considered a system for helping the user with creative explorations, or it might be considered a system which attempts to “learn” about human aesthetics from the user. In either case, it allows the user and computer to work together interactively in a new way to produce results that neither could easily produce alone.

Vox Populi interface, designed to be flexible for the user to modify the music being generated, might be appropriate to study human musical perception. The choices interactively made by a composer in response to the evolving music can be stored as a parametric control file and recorded as a musical signal as well. The obtained data can be applied to training neural networks, that in turn may be used as fitness functions, imposing a personal style.

The mappings are easy to produce and give detailed pictures of musical sequences. This information  could be used to model a sequence or a cognitive structure underlying a musical design. Finally, the use of IGA in compositional systems is  powerful to control the complexity of the music material in the flow. IGA associated to other interactive strategies, such as graphic the environment used in Vox Populi, could be a strategy to avoid the musical fitness bottleneck. High dimensional curves would lead the musician to have overall view of the musical evolution.

The relevance of this approach goes beyond music applications per se. Computer music systems that are built on the basis of a solid theory can be coherently embedded into multimedia environments. The richness and specialty of the music domain are likely to initiate new thinking and ideas, which will have an impact on areas such as knowledge representation and planning, and on the design of visual formalisms and human-computer interfaces in general.


Part of this project is possible through the support of FAPESP to the Gesture Interface Laboratory in which VoxPopuli was developed. Fernando J. Von Zuben is supported by the CNPq grant 300910/96-7. Artemis Moroni is supported by ITI.


[1] B. Fogel. Evolutionary Computation - Toward a New Philosophy of Machine Intelligence, USA: IEEE Press, USA, 46 – 47, 1995.

[2] W. Atmar. Notes on the Simulation of  Evolution, IEEE Transactions on Neural Networks Vol. 5, No. 1, 130 –147, 1994.

[3] James D. Foley, Andries van Dam, Steven K. Feirner and John F. Hughes. Computer Graphics Principles and Practice, Addison-Wesley Publishing Company, p. 1018, 1996.

[4] K. Sims. Interactive Evolution of Equations for Procedural Models, The Visual Computer Vol. 9, No. 9, 466 -- 476, 1993.

[5] K. Sims. Evolving Three-Dimensional Morphology and Behaviour. Evolutionary Design by Computers, ed. Peter J. Bentley, Morgan Kaufmann, 297 – 321,1999.

[6] A. Moroni, J. Manzolli, F. Von Zuben, R. Gudwin. Evolutionary Computation applied to Algorithmic Composition, Proceedings of CEC99 - IEEE International Conference on Evolutionary Computation, Washington D. C., p. 807 – 810, 1999.

[7] J. Manzolli, A. Moroni, F. Von Zuben, R. Gudwin (1999). An Evolutionary Approach Applied to Algorithmic Composition, Proceedings of SBC’99 - XIX National Congress of the Computation Brazilian Society, Rio de Janeiro, Vol. 3, 201 -- 210 (1999).

[8] W. Pronovost, L. Yenkin, D. Anderson and R. Lerner. The Voice Visualizer, Ammerican Annals of the Deaf, 113: 230 – 238, 1968.

[9] C. Pickover, On the Use of Symmetrized Dot Patterns for the Visual Characterization of Speech Waveforms and Other Sampled Data, Journal of the Acoustical Society of America 80(3): 955 – 960, 1986.

[10] R. Pellegrino, The Electronic Arts of Sound and Light, New York: Van Nostrand, 1993

[11] G. Monro and J. Pressing. Sound Visualization using Embedding: The Art and Science of Auditory Autocorrelation, Computer Musical Journal, 22:2, 20 – 34, 1998

[12] S. Baluja, D. Pomerleau and T. Jochem (1994). Towards automated artificial evolution for computer-generated images, Connection Science, 6(2-3), 325 -- 354.

[13] D. R. Hofstadter (1985). Methamagical Themas, USA: Basic Books, p. 693.