Creating Audio-Visual Metaphor

With the gifts given by the nature, we are able to see, to hear, to smell and to touch. Such sensory perceptions help us human beings to get aware of what is happening in our surroundings, and keep us away from danger so that we could survive. Thanks to the efficiency of nature, the total result that we get from these sensory perceptions is actually far more than a simple addition of each aspect.

In a piece of audio-visual artwork, the audio-visual language covers two fundamental types of sensory perception. What the audience expect when they seeing the visual effects and hearing the sonic contents is definitely not only a simple rhythmic coherence between audios and visuals but also the abstract concept or metaphor that represented or constructed by both the internal relationship between audio and visual elements and such elements themselves.

In terms of audio-visual ensemble, we could consider audio and visual as two players in an ensemble band. Rather than producing sound reacting to video or playing video reacting to audio, what this band could do and should do is organizing audio and visual elements together as some particular metaphors and integrating these metaphors together to create the narrative of performance. As we know, instruments in a band are played according to the well-composed structure in order to get an effect of 1+1>2. And in an audiovisual ensemble we could do the same way. Audio and visual elements don’t necessarily go along with each other, but organized together to be complementary to each other.

A filmmaker, Sergei Eisenstein, maintained that in filmmaking audio and visual elements should not accompany each other in synchronization dependently, but should be structured into a more complicated composition (Robertson, 2011). As filmmaking is also a process of expressing audio-visual language, this theory could also be applied in audio-visual live performance. Another area that is deploying audio-visual language is the game industry. One emerging mobile game, Monument Valley, provides a good example for the complementing effect of audio and visual elements. In the process of puzzle solving, when controllers are handled by players, they produce some notes as well as something happens visually, and different notes represent different levels of change by turning the handler while such difference might not be quite obvious if players observe with their eyes. In this way, sound contributes more to the gameplay rather than just be the sound effect that adds brilliance to the present splendor.

Therefore, in the construction of an audio-visual ensemble performance, while the rhythmic coherence shows the external link between audio and visual elements, the creation and organization of metaphor reveals the internal integration of audio and visual.

As far as I’ve concerned, method for creating audio-visual metaphor could be concluded into two types.

One way is using different combination of both audio and visual elements to create different metaphors that represent specific physical event or emotional situation. For example, while the combination of soft music and cloudy day might deliver a sense of sadness, the combination of intense music and cloudy day could express the emotion of anger. On the other hand, leisurely sound with sunny day makes people feel relaxing while rhythmic music with sunny day makes them feel excited.

Another way is representing a particular phenomenon or feeling with either audio or visual element, and organizing such elements together to create a context for narrative. Then the storytelling flow could be developed by composition in this context. According to my own experience, the creation of metaphor in our project is actually following this way.

We chose the allegory of cave as the framework of narrative. While constructing the storytelling line, we allocated those existing resources as representations of different elements in the allegory. Audience was set in a situation as the character in the allegory, the prisoner that chained in the cave. Graphics on the main screen worked as a representation of subjective consciousness of the character who was only allowed to see shadows in front of him. On the screen there were lights controlled by audio input and shadows that came along with lights. Obviously shadows represented the opinion of character about the world even though they were actually illusion. Lights were providing an environment for narrative. Through the flicking and switches of colours, these lights provided “keynotes” for each section of the story flow. At the same time, video projected on the surrounding walls was an abstract metaphor of the reality outside the cave, and the combination of abstract lines and colours revealed the crash and struggle of opinions.

While sonic contents structured the story flow, visual elements filled the space by illustrating the narrative section with different combination of metaphor. At first as prisoners were only able to see shadows in front of them, the visual effects existed mainly on the main screen. Shadows in this part seemed scary and evil. As the story developing, struggle appeared in the characters’ thinking with the lines and colours. While the character escaped from cave, videos were displayed on walls and shadows were muted. The with the return to cave, three visual elements finally showed at the same time as new and old theories crash together and create a dramatic conflict. At the end of the performance, while video on walls implied a probable sad ending for the innovator, shadows on screen illustrated some symbols of beautiful existence of real world which implied that seed of wisdom was already planted in people’s mind.

By creating audio-visual metaphor and allocating these metaphors on the time line appropriately the audio and visual elements of the project finally integrated together tightly. However there are still a lot details need to be improved. In my opinion the combination of metaphors could still be diversified and contents of metaphor in different section could still be differentiated to gain a better dramatic effect.


Robertson, Robert. 2011. Eisenstein on the Audiovisual: The Montage of Music, Image and Sound in Cinema. New York, NY: I.B. Tauris.

Gadassik, Alla. 2013. A Review of “Eisenstein on the Audiovisual: The Montage of Music, Image and Sound in Cinema”, Quarterly Review of Film and Video, 30:4, 377-381.

Inside Out

Human experience, the one we always, already, are situated in, is something mysterious and, a philosopher said, intrinsically paradoxical. Yet, we can’t help but look for sense and meaning throughout it and an audio/visual performance is a very powerful tool to say and discover something about it. It is often said that its strength lies in the interaction between sounds and visuals, and that is certainly true; however, the same can be said of two musical instruments within a well-composed piece. Therefore, what is that really makes an audio/visual performance such an engaging event? I will try to make a few hypotheses based on the experience I gained throughout the Studio Project.

To use an abused sentence, people can be divided into two categories when listening to music: some like to shut their eyes, some others let them wander around. Sometimes I belong to the first group, and I can even end up placing my fingers where the forehead meets the nose, in an attempt to keep my eyes shut; when I do that, it’s because what I see in my mind doesn’t really match what’s physically around me. More: is the music itself that is evoking images in my mind, which would be weakened by the outside world’s interference. This can happen either when music doesn’t engage me so much or when what I see around me is too mismatching and invasive, but in any case the gesture seems to have the power to amplify music through the amplification of inner images. Actually, if one carefully thinks about it, the opposite is also true: images and pictures demand, and often strongly, for a sonic imaginative act:


This seems to be a relevant result in order to understand the mechanism behind audio/visual engagement: in facts, if an artist is able to provide visual inputs that not only match, but even support and boost the musical side (and vice-versa), the positive feedback can really open an experiential dimension beyond mere sound and mere image.

For my performance I chose not to have just a big screen for the audience to stare at the whole time, but rather a three-dimensional environment that could represent a mental space, inhabited by entities of different nature. This space is defined by four televisions and a sub woofer, which form an ideal flat surface, separating the audience from the stage. There are a number of things that can be said about the TV & sub complex: first, it can suggest a criticism towards facile audio/visual approaches, which only juxtapose the two; from this point of view, the flatness of the screens couldn’t be more different from the subwoofer, which is a black box that disappears in the dark and whose sonic emission, for physical reasons, is almost a-directional (if someone really wanted to push this observation further, s/he could see in it Heidegger’s criticism of being as simple-presence versus being as emerging within a world…). Secondly, it could suggest a criticism of consumerism: after all, I was able to collect, for free, working televisions that people are dismissing just because they “want to upgrade” (direct quotation of one of the ex-owners). Moreover, there is a theme which is probably specific of Italy: over many years spent there, I realized that a large part of public opinion was manipulated by a constant bombardment of lies via television, and the way those lies were passed off as truth was to create a fake world that would turn people into superficial persons. Remarkably, the word surface translates in italian as superficie, giving a linguistic identity between the physic and the moral concepts. At the same time, however, televisions per se are just a representational medium, and this is the reason why we find the famous quotation by Magritte ceci n’est pas une pipe, transformed into ceci n’est pas un homme, meaning “this is not a human being” (the linguistic gender issue is here maybe exagerated). From this perspective, the TV “wall” also becomes a gate, a perceptual door that invite us inside a new, psychical space.


As well as inviting in, doors usually protect what’s behind them. In this case they are protecting an internal, intimate space. The figure of the woman is emblematic in this perspective, symbolizing the fact that whatever tries to go outside gets transmuted, chopped up, destroyed. Above all, her hand is (hopefully) a powerful image representing our internal thoughts and feelings trying to connect to the external world, but failing. The typical theme of incommunicability is here presented.

The mannequin is in a contrapuntal relationship with the performer: as said, he is safe and protected, while she dies on the surface. But also: she is fake while he is alive, she is static while he flounders, she is naked while he is dressed. As always, in counterpoints, there are also connections: his gestures cast light on her, and she is a woman but on her body we read homme (french for “man” as well as for “human being”).

On the far back of the stage a strong red light casted on an elevated curtain reminds of the inscrutable wildness of human unconscious.

The other side of surface (the one defined by the TVs) is the reign of sound, in which the audience is fully immersed. Sounds circulate, with no preferred direction, between four speakers placed at the corners. The strong spatial difference between the flatness of the television screens and the permeating omnipresence of sound is a way of highlighting the theme of the surface, while at the same time it suggests a deep space that the eye is invited to look for on the stage; the nature of sounds themselves, with an extreme use several different kinds of resonances, pushes in the same direction.

I thought that a good title for the performance can be On The Surface.

final performance: technical details

The performance I presented as the final project involves an audio/visual instrument composed of a few different systems.


The audio processes are essentially embodied by a Max (Cycling’74) patch. It is a rather complex system and makes use of many non native objects as well as one proprietary process (a compressor) so I invite who is interested in having it to contact me at my e-mail address to have instructions.


The first sound heard through the piece is the signal of a microphone attached to a wooden box; the signal is then processed by two sets of eight resonant filter banks, each featuring eleven filters and each being independent from one another (they can keep ringing while other ones are excited). Let us consider the first set: each bank has fixed gains for each filter and a fixed fundamental frequency assigned to the first one, but this first filter has a very low gain, so that it is hard to perceive it compared to the others. The remaining ten filters have variable frequencies that are most of the time aliased beyond the Nyquist frequency and thus symmetrically reflecting within the audible spectrum, contributing to make the fundamental frequency of each bank imperceptible and to achieve a general timbral complexity. The current active bank is selected after each attack exceeding a value of 2.5 % of the full digital scale, while frequencies are shifted after a certain number of these attacks; this number is controlled by the height of the performer’s left shoulder. The second set works in a slightly different way: here the fundamental frequency of each bank is more audible, is lower and there isn’t aliasing, so that the final result is more similar to standard modal processes; however, fundamental frequencies across the banks are more spread out (55 to 220 Hz), so variety is preserved. Current bank is selected after each attack exceeding a value of 2.0 % of the full digital scale and frequencies are always moving in accordance to a random signal, whose rate and scale though is controlled by the height of the performer’s left shoulder. For both sets, the Q factor of the filters is scaled depending on their index (the higher the index, the higher the Q, but because of the aliasing higher index does not mean higher frequency) and globally multiplied by the value of the height of the performer’s shoulder. Lastly, the bank selection is not abrupt, but happens via a smooth routing with a ramp time of 5 milliseconds: this feature had to be introduced to deal with more continuous signal coming from the microphone; for similar reasons, a V-shaped envelope of a total time of 10 milliseconds multiplies the audio input so that, when frequencies of the first bank are shifted, its level is zero.

A second, slightly less “natural” voice is a live granulation of the reverberation – incidentally, a signal driven, attack responsive one: in turns, an ever-changing space – of the first voice. Even if it is controlled by the right shoulder of the performer, its sonic nature is more abstract and the gestures much less “organic”, providing most of the times crescendos which culminate with evenly-spaced short grains.
A third voice is the feedback system which can be heard towards the end of the first section of the performance.


As mentioned above, many parameters are controlled by the movement of the performer’s shoulders. This is done using a pair of stretch sensors (available here: attached to my pants:


the other end of the electrical cable is then connected to a voltage divider circuit connected to an Arduino board, as well explained I this tutorial:

The Max patch I used to interface Arduino is available here:

I found this solution very effective compared to other body motion tracking systems, first of all because they can cost thousands of pounds and secondly because it naturally provides a physical feedback of the stretching force.


I worked on a set of four televisions fed with audio signal (scaled up 10000 times) to make them flicker. In most cases I did it using their SCART plug:


to do that I soldered some odd cables with a TRS jack on one end and a SCART plug on the other:


Here’s a video of a test:

For previous experiments, some of which ended up with nice videos, see my blog:

Lissajous and Forbidden Motion with PS3 Controller

The final version of the audio-visual performance system used for this performance expanded upon the Lissajous Organ presented in submission1.  I developed a second audio-visual instrument named Forbidden Motion.  By running distorted, beat-based noise through a subtractive synthesis processes similar to Convolution Brother’s ‘forbidden-planet’ and finally through Audio Damage’s EOS reverb, a rich, interesting sound was generated.


The high frequency sounds in this clip are a result of this proces:

A simple ioscbank was also implemented to generate dense amounts of sine waves. Lastly, abilities to degrade the audio signal allowed for dirty, crunchy sonorities in the aesthetics of our cave theme.

I chose to use visuals typical of static on analog TVs for this part of my system.


By modulating brightness controls via audio input, these visuals responded to the audio output of this part of my system.


2 for 1 control

In developing a way to control my Lissajous Organ and Spectral Motion together, I stumbled upon a system of control in which I could control 16 systems or equal or greater size than the ones I used.  By packaging the data coming out from a controller and routing this data in an efficient way, simple controls can be mapped to many levels of parameters.



Macro routing:

Example of routing

Inside R1 Buffer routes- Gated Buffer routing:

Lock-Level Buffer gates

An unexpected result of using a system structured in this fashion was the ability to combine both visual systems together on the fly.


An important feature that this system of control was freedom from my computer screen.  This allowed for more gesture driven and intimate interactions with ensemble members like the ones seen here:

Unfortunately, my visuals projected into the audience during this clip were not captured, but are the analog TV type discussed earlier.

More detail about the structuring philosophies of this system can be found in my Submission3 blog post.

Audio-visual spacialization

I chose to use a projector capable of movement so as to utilize the space in which we performed.  Here you can see it projected onto the floor:

In addition to the audio spacialization, visual space was also planned out so as to accentuate the performance space and leave room for each others visuals to stand out.

Problems encountered

Unresolved differences regarding simple work-place etiquette led to overwhelming, emotional stress and finally extreme verbal harassment the day before our final performance.  Although this course appears structured in a hierarchical fashion, in order to consult supervisors when needed, no coherent plan for conflict resolution resulted from this structure.  Even when approached multiple times with the same issue, repetitive advice received from my supervisor in regards to our issues of diversity was, “You cannot make anybody do anything.”  This advice efficiently dissolved the fragile bonds that existed between individuals with different backgrounds.  I lament not being able to overcome these issues and believe the paradigm of conflict-resolution in the DMSP course needs to be contemplated and restructured.

Spectromorphology and Audio-Visual Performance

Real-time audiovisual performance is governed by a multitude of variables that include, but are not limited to, performance model, technology integration and aesthetic goals.  During the course of our Audio Visual Ensemble’s digital media studio project, time and time again we were asked, “What makes a convincing audiovisual performance?”  Although this is a multifaceted topic with no single definitive answer, I will offer suggestions as to how spectromorphology can be used to analyze strategies for better developing convincing real-time performance.

To start, it is important to realize that in performance situations involving interaction between performers, the audience and their medium, what viewers perceive is not necessarily the same experience as for those performing. What may deem entertaining to the performers may be misunderstood or uninteresting to the audience, especially if the performance lacks a clearly defined theme and significant dynamic change over time.  Since my background is in sound composition and performance, I have been able to draw upon some parallels of similar concerns in electroacoustic music that assist in directing this discussion.

Like language, audiovisual content is constructed from extrinsic-intrinsic building blocks and the ability to interpret their message relies on the ability to communicate meaning effectively. Denis Smalley introduced the concept of spectromorphology as a tool to assess criteria for selecting sound material and organizing structural relationships that are linked to recognizable shared experiences outside of music (Smalley 1997).  Smalley explains, “How composers conceive musical content and form- their aims, models, systems, techniques, and structural plans- is not the same as what listeners perceive in that same music.”  This begs the question, what are universal shared experiences and how might these ideas be conveyed in the realm of audio-vision.

Smalley believed that for performances which rely heavily on technology, where gesture and the corresponding results are not immediately apparent, the ability to adequately convey meaning is impaired by the inability to directly link action to a corresponding result (Smalley 1997).   He suggests that we should try to ignore the technology used in making music, and in this case, the performance, by recognizing that gesture and relationships between source and cause are a more substantial factor for communicating ideas effectively (Smalley 1997).

In a post-digital era, this is no easy feat as technology manifests on almost every level. Perhaps the paradigm has shifted since Smalley developed his initial ideas and now it is more salient to balance technology in ways that convey universal shared experience.  I propose that technology alone is not the underlying concern but that relying on technology without having developed a well-defined theme or aesthetic goal tied to significant meaning or gesture is of greater importance.

Spectromorphology is concerned with motion and growth process and points to trans-contextual interactivity of intrinsic and extrinsic relationships for answers.  These intrinsic-extrinsic relationships need not be limited to sonic or visual events themselves but can be extended to the gesture that defines them (Smalley 1997).  Smalley refers to this concept as source bonding, which relates sound structures to each other on the basis that they appear to have symbolic or literal shared origins.  Applied at both higher and lower levels of structure, physical or metaphoric gestures that clearly exemplify corresponding sonic and visual counterparts can be used to more clearly convey meaning, purpose and intent (Smalley 1997).  Translating these ideas to the audio-visual realm means linking sound and vision in ways that dynamically delineate cause and effect.  By extrapolating intrinsic perspective and translating meaning into extrinsic constructs, both the performers and audience gain deeper insight into function and significance of not only the main theme but also the individual contributing factors that make up the performance.

Whilst meaning plays an important role in disclosing intention, it cannot be convincingly conveyed without placing it into a larger structural context.  Smalley explains, “Motion and growth have directional tendencies which lead us to expect possible outcomes.”  He in turn breaks these into seven characteristic motions that relate to spectral relationships of sound production (Smalley 1997).

Motion Chart

Although it is beyond the scope of this brief essay to elaborate on each of these, it is important to note that these structural relationships can be applied to help establish dynamic trajectory in ways that strengthen structure to better convey meaning over time.  If applied to audio-visual performance, we can devise precise ways of mapping movement to states of tension and release based off of relationships dictated by transitioning events in conjunction to their starting points.  Extrapolating on this idea, if we identify events according to gesture, movement and meaning, we can fit them into a larger dynamic structure and manipulate their placement to either fulfill or break expected outcomes.  This approach in turn can be used to create a more systematic approach and aide in building more elaborate and well though out performance structures.

Stepping back from our performance and having time to contemplate its strengths and limitations, it has been beneficial to consider concepts pertaining to spectromorphology and how they could be applied to our work.  To a degree, because combined audio-visual performance was new to us, the group was short sided in its approach by placing too great of an emphasis on developing the technology used therein.  Although we settled on a theme, overall specific details failed at times to directly correlate to any shared experience outside of the sounds and visuals themselves.  The level to which our performance could grow, change and evolve may be improved with more attention to how we guide transitions through directed movement, energy and gesture.  While spectromorphology does not provide a universal set of answers, it does afford some valuable tools for diagnosing structure and strategy.


Smalley, Denis. 1997. “Spectromorphology: Explaining Sound-Shapes.” Organised Sound 2 (2): 107–26.

AVE Video Documentation


In our final performance in the University of Edinburgh’s “Inspace”, we chose to model our audio-visual narrative around Plato’s “Allegory of a Cave.”  Individual performers were allowed a large degree of freedom to interpret the story on a personal level. The ensemble placed particular emphasis on designing instruments that cogently connected audio and visual aspects in ways that would remain convincing and engaging for the both audience and performers alike.

Live Performance:

Also on YouTube: Audiovisual Ensemble Live Performance


Also on YouTube: A Documentary of Audiovisual Ensemble Project

The AVE 2014 ensemble aspired to avoid clichés associated with the theme and audio-visual performance in general. The “story” served mainly as a loose timeline to introduce transitions, build new textures, and to divide content more fluidly.  Strictly for rehearsal purposes, the following outline was developed.  Although not originally intended for this function, the outline can serve as a kind of “score” or road-map for examining our ideas on a more literal level.


Hierarchical System Design in Live Audio-Visual Improvisation

This article explores the usage of a hierarchical system to interact with audio-visual systems via digital mapping interfaced through a gaming controller.  It will investigate a system developed called HLSC (Hierarchical Lindenmayer inspired Structure of Control) featured in the Audio Visual Ensemble 2014 project as part of the Digital Media Studio Project course at The University of Edinburgh.  It will not investigate hardware controllers, but will acknowledge the unique capabilities when pairing gaming devices with real-time, audio-visual systems.  The device used for the Audio Visual Ensemble project is a PS3, Dualshock controller.  The methodology of controller assisted performance used by the HLSC infrastructure can be applied to any comparable hardware controller.

A note on gaming devices

            Most modern gaming devices are designed for spontaneous, multi-dimensional interactions in a virtual gaming world.  For this reason, they are naturally inclined towards controlling similarly complicated interactions in the audio-visual realm. Todd Winkler (2001) writes about computer controllers:

“Thought should be given to the kinds of physical gestures used to send data to the computer, and how specific movements can best serve a composition….Playing new sounds with old instruments only makes sense if the old technique is valid for the composition” (p.37)

With the computer’s ability to create sound without a physical medium, controller decisions can and should be tailored to specific performance situations.  There are documented ways of using gaming controllers within the maxMSP programming environment (Jensenius 2007, p.106).  However, most of these utilize simple and linear systems for users with a “plug-and-play” mentality.  While simple for immediate use as a real-time performance tools, these systems have limitations when used to control multiple systems in creative and improvisatory ways. Jensenius elaborates:

“A common solution to master such a complex system, i.e. consisting of many different and connected parts, is to create presets that define a set of parameters that work well together. These presets may then be used for further exploration of the sound synthesis model. However, the result is often that many people only use the presets, and never actually control the sound model more than changing from one preset to another. This again makes for a static and point-based type of control” (p. 101).

This type of static control which relies heavily on presets was not conducive to the interactive aesthetic of the Audio Visual Ensemble 2014.  I developed the HLSC system to separate myself from the computer screen, control complex systems, and facilitate gesture driven interaction with ensemble members.

Inspiration for model

Originally inspired by structures produced by Lindenmayer systems, these tree-like hierarchies are a good way to visualize how this control system works.

Courtesy of Wikipedia:

Starting from level N=0, we see from this diagram that at each level contains more and more elements.  Although originally designed to model organic growth,  I imagined these levels as a parent-child system of control.  For example, the A and B in level n=1 control the elements in level n=2 which in turn control the elements in level n=3, etc.

To apply this structure to a system of control, the ability to go from level to level was paramount.  By moving through levels, macro and micro controls can be accessed by the same device.  Further, the self-similar nature of the Lindenmayer structure allows different levels to have familiar shapes of control.  For the application of this structure to the PS3 controller, I utilized button combinations and bumpers to access and gate data flow.

Controller application

Please refer to this diagram from the official PS3 website for exact controller surfaces referenced in this section.

PS3_ControllerOn this hardware controller, one stream of the HLSC system taken to its end could can be represented as such:


Please click to expand

Only one element of each level is expanded in this diagram, but each element of the same level could be expanded in an identical way.  A more conceptual representation of this could structure is as follows:

Please click to expand

Please click to expand

This representation shows a multi-level, hierarchical structure similar to the Lindenmayer system seen earlier.  To move between levels and gate information for specific settings, a combination of home and bumper is used at the Top-Level, bumpers to gate information at the Lock-Level, and either alt or start at the Surface-Level to access alternative, lower level parameters.

For pictures of the patch and more details about its use in performance, please refer to my Submission2 blog post.  To download the interfaced used to connect and package data from my PS3 Dualshock controller, download the patches from this link:

Complexity and creativity

By using HLSC, levels of one-to-one and one-to-many interactions can be accomplished without physically touching a computer.  With the aesthetic goal of live interactivity for my performance with the Audio Visual Ensemble 2014, I needed to interact with multi-dimensional parameters of two visual systems, two musical systems, and combinations of these systems in real-time.

Despite working with so many systems and controls simultaneously, less than 10% of the possible controls in the HLSC system were used (only two of the “Top-Level” branches are partially utilized).  However, the vastness of this system allowed me to make mapping decisions based on artistic decisions instead of system limitations.  For example, mapping orientation data to be controlled by gestures during more intimate sections where dialogue is especially important.

Although parameters were mapped scrupulously to be logically arranged, at times I found myself forgetting what exact surfaces of my controller accessed what.  However, these unpredictable results were often aesthetically rewarding.  Electronic composer Brian Eno dreams of getting lost in a complex, self-generating system similar to HLSC:

“But what if the synthesizer just ‘grew’ programs? If you pressed a ‘randomize’ button which then set any of the thousand ‘black-box’ parameters to various values and gave you sixteen variations. You listen to each of those and then press on one or two of them—your favourite choices. Immediately, the machine generates 16 more variations based on the ‘parents’ you’ve selected. You choose again. And so on. The attraction of this idea is that one could navigate through very large design spaces without necessarily having any idea at all of how any of these things were being made” (Dahlstedt, 2007)

This passage demonstrates the creative potential of a complex system even when a performer is lost.  With simpler one-to-one mapping, all combinations of interaction can be quickly exhausted, but with a complex system like HLSC, the possibility for unexpected interactions is much greater and therefore can foster creative results.


The HLSC system is one that I will be utilizing and further developing in the future. Its multifunctional, high-level control allows for more meaningful, audio-visual interactions than static presets.  The ability to negotiate and improvise with many nested levels of control in real-time makes it a valuable performance tool.  Future application to the API of DAWs such as Abelton Live 9 or Bitwig could allow an average game controller to be used as a powerful production tool.

An interesting study for the future would be to develop rules for Lindenmayer systems of control based on a particular controller specifications.  By formulating and testing equations based on the number of surfaces and the dimensions in which those surfaces function, perhaps a HLSC-like system could be autonomously generated without having any experience with a specific controller.



Dahlstedt, Palle. 2007. Evolution in Creative Sound Design. In Evolutionary Computer Music. Eduardo Reck Miranda MSc and John Al Biles BA MS, eds. Pp. 79–99. Springer London., accessed April 24, 2014.

Jensenius, Alexander Refsum. 2007. Action-Sound : Developing Methods and Tools to Study Music-Related Body Movement., accessed April 24, 2014.

L-System. 2014. Wikipedia, the Free Encyclopedia., accessed April 25, 2014.

Sony. n.d. [Diagram of PS3 controll]. Retrieved from

Winkler, Todd. 2001. Composing Interactive Music: Techniques and Ideas Using Max. New Ed edition. Cambridge, Mass.: MIT Press.

Audio-visual Animation

We all have experienced unlimited illusions when listening to music. A number of artists have devoted to visualize this kind of illusions by producing dynamic images in the form of lines, graphics or color tones. In this regard the music itself is decorated with visual elements. This provides a new space for animation production.

Concept of Audio-visual Animation

Audio-visual animation (one kind of experimental animations) is an emerging animation form with strong visual effects. It contains diverse patterns of visual changes, organizing narrative structure in a nonlinear manner. Rather than act as background in the mainstream animation, music in an audio-visual animation has a dominant position and even plays a role as screen writer. Elements that appear in audio-visual animation are designed depending on designers’ understanding of the music, which is different from the traditional production process of “script-painting-animation”.

Audiovisual Awareness in Animation

Making a motion picture of the changing light and shadow, lines and shapes, colors and shades according to abstract geometry or mathematical formulas to concretize the music is a prevalent way in audiovisual field, where music would present its own character. Sound itself is a dynamic form of art, the structure of which shares very similar characteristics with dynamic images. The soundtrack of  early animations is used to make up for the dialogue insufficiency or enhance the fun of characters’ look and pace, in which case the sound is chosen depending on the development of storyline, only playing a role in expanding the plot or regulating the atmosphere. In comparison, audio-visual animation has more strict requirements for sound. Since the design inspiration could come from one’s perception of sound, audio-visual animation designers prefer to use the sound that is highly capable of creating mental pictures.

With new innovations in technology, audiovisual is evolving from the music video to large, site-specific projections live shows. Although it may seem like a recent phenomenon, its history goes back much further. Oskar Fischinger—an early 20th century German painter and pioneer animator who is regarded as the Father of Visual Music—was specialized in avant-garde, audiovisual films and noted for his abstract shapes synchronized to music. Many of his animations were created using unusual materials such as colored liquid, filters, slides, and wax. He created Studies when he was working on special effects for the movie Woman in the Moon. Studies is a series of abstract short films that feature black and white forms that are synchronized to music. An excerpt from Studie nr. 8 is shown below.

Studie nr 8 (excerpt) by Oskar Fischinger

Someone may argue that the general public  can’t get abstract art, while as long as little shapes swoop or tremble or dance to music, they seem to have purpose and meaning. Therefore Fischinger never tried to illustrate music, but to provide a visual equivalent. He created critically influential animations that combined his strong sense of audiovisual awareness. His method of conveying sound and rhythm with color and shapes is the basis of visual music.

Methods in Making Audio-visual Animation

Running the animation in accordance with the laws of sound is a new method worth exploring. Admittedly visualizing the auditory information by creative design could not only eliminate the pressure of script writing, but also lower the design limitations and may lead to uniquely amazing effects. Compared to the animation with a fixed storyline, the idea of audio-visual conversion and integration is more free to achieve improvisation. Additionally different people have different understanding and imagination of sound, thus to some extent the overlap of script and design ideas could be avoided effectively.

Selecting the sound that is highly capable of creating mental pictures is quite critical in this procedure, which requires us to explore and accumulate a variety of interesting sound, or to record the sound in daily life and then edit them through audio softwares. This is an exploratory and experimental process in training our sensitivity and appreciation for sound, and there are lots of ways to do this. For instance, we could try to make up our own words to some familiar melody, collect some sound that is not appealing to us and then change its use of space to achieve harmony, or add some sound effects to a piece of animation in an unconventional approach.

Character setting of most mainstream animations is based on human body or organism, whereas in audio-visual animation these characters could be replaced by abstract visual modeling. We could generate our own animation style by applying a series of basic elements, making the originally meaningless graphics obtain wonderful results in the rhythm. For example, monotonous and simple abstract graphics could be associated with soft or alienated sound, while sharp graphics would match intense sound. Audio-visual animation basically explores the animation effects in terms of  abstractive and realistic, transformation and deformation, etc..

Overall, audio-visual animation requires designers to intensively research the functions and features of sound as well as the motion characteristics of graphics, which can only be achieved after a long-term artistic experience and design training.


Robertson R. Eisenstein on the audiovisual: the montage of music, image and sound in cinema[M]. Tauris Academic Studies, 2009.

Oskar Fischinger: the animation wizard who angered Walt Disney and the Nazis

Women in the moon