Creating Audio-Visual Metaphor

With the gifts given by the nature, we are able to see, to hear, to smell and to touch. Such sensory perceptions help us human beings to get aware of what is happening in our surroundings, and keep us away from danger so that we could survive. Thanks to the efficiency of nature, the total result that we get from these sensory perceptions is actually far more than a simple addition of each aspect.

In a piece of audio-visual artwork, the audio-visual language covers two fundamental types of sensory perception. What the audience expect when they seeing the visual effects and hearing the sonic contents is definitely not only a simple rhythmic coherence between audios and visuals but also the abstract concept or metaphor that represented or constructed by both the internal relationship between audio and visual elements and such elements themselves.

In terms of audio-visual ensemble, we could consider audio and visual as two players in an ensemble band. Rather than producing sound reacting to video or playing video reacting to audio, what this band could do and should do is organizing audio and visual elements together as some particular metaphors and integrating these metaphors together to create the narrative of performance. As we know, instruments in a band are played according to the well-composed structure in order to get an effect of 1+1>2. And in an audiovisual ensemble we could do the same way. Audio and visual elements don’t necessarily go along with each other, but organized together to be complementary to each other.

A filmmaker, Sergei Eisenstein, maintained that in filmmaking audio and visual elements should not accompany each other in synchronization dependently, but should be structured into a more complicated composition (Robertson, 2011). As filmmaking is also a process of expressing audio-visual language, this theory could also be applied in audio-visual live performance. Another area that is deploying audio-visual language is the game industry. One emerging mobile game, Monument Valley, provides a good example for the complementing effect of audio and visual elements. In the process of puzzle solving, when controllers are handled by players, they produce some notes as well as something happens visually, and different notes represent different levels of change by turning the handler while such difference might not be quite obvious if players observe with their eyes. In this way, sound contributes more to the gameplay rather than just be the sound effect that adds brilliance to the present splendor.

Therefore, in the construction of an audio-visual ensemble performance, while the rhythmic coherence shows the external link between audio and visual elements, the creation and organization of metaphor reveals the internal integration of audio and visual.

As far as I’ve concerned, method for creating audio-visual metaphor could be concluded into two types.

One way is using different combination of both audio and visual elements to create different metaphors that represent specific physical event or emotional situation. For example, while the combination of soft music and cloudy day might deliver a sense of sadness, the combination of intense music and cloudy day could express the emotion of anger. On the other hand, leisurely sound with sunny day makes people feel relaxing while rhythmic music with sunny day makes them feel excited.

Another way is representing a particular phenomenon or feeling with either audio or visual element, and organizing such elements together to create a context for narrative. Then the storytelling flow could be developed by composition in this context. According to my own experience, the creation of metaphor in our project is actually following this way.

We chose the allegory of cave as the framework of narrative. While constructing the storytelling line, we allocated those existing resources as representations of different elements in the allegory. Audience was set in a situation as the character in the allegory, the prisoner that chained in the cave. Graphics on the main screen worked as a representation of subjective consciousness of the character who was only allowed to see shadows in front of him. On the screen there were lights controlled by audio input and shadows that came along with lights. Obviously shadows represented the opinion of character about the world even though they were actually illusion. Lights were providing an environment for narrative. Through the flicking and switches of colours, these lights provided “keynotes” for each section of the story flow. At the same time, video projected on the surrounding walls was an abstract metaphor of the reality outside the cave, and the combination of abstract lines and colours revealed the crash and struggle of opinions.

While sonic contents structured the story flow, visual elements filled the space by illustrating the narrative section with different combination of metaphor. At first as prisoners were only able to see shadows in front of them, the visual effects existed mainly on the main screen. Shadows in this part seemed scary and evil. As the story developing, struggle appeared in the characters’ thinking with the lines and colours. While the character escaped from cave, videos were displayed on walls and shadows were muted. The with the return to cave, three visual elements finally showed at the same time as new and old theories crash together and create a dramatic conflict. At the end of the performance, while video on walls implied a probable sad ending for the innovator, shadows on screen illustrated some symbols of beautiful existence of real world which implied that seed of wisdom was already planted in people’s mind.

By creating audio-visual metaphor and allocating these metaphors on the time line appropriately the audio and visual elements of the project finally integrated together tightly. However there are still a lot details need to be improved. In my opinion the combination of metaphors could still be diversified and contents of metaphor in different section could still be differentiated to gain a better dramatic effect.


Robertson, Robert. 2011. Eisenstein on the Audiovisual: The Montage of Music, Image and Sound in Cinema. New York, NY: I.B. Tauris.

Gadassik, Alla. 2013. A Review of “Eisenstein on the Audiovisual: The Montage of Music, Image and Sound in Cinema”, Quarterly Review of Film and Video, 30:4, 377-381.

AVE Video Documentation


In our final performance in the University of Edinburgh’s “Inspace”, we chose to model our audio-visual narrative around Plato’s “Allegory of a Cave.”  Individual performers were allowed a large degree of freedom to interpret the story on a personal level. The ensemble placed particular emphasis on designing instruments that cogently connected audio and visual aspects in ways that would remain convincing and engaging for the both audience and performers alike.

Live Performance:

Also on YouTube: Audiovisual Ensemble Live Performance


Also on YouTube: A Documentary of Audiovisual Ensemble Project

The AVE 2014 ensemble aspired to avoid clichés associated with the theme and audio-visual performance in general. The “story” served mainly as a loose timeline to introduce transitions, build new textures, and to divide content more fluidly.  Strictly for rehearsal purposes, the following outline was developed.  Although not originally intended for this function, the outline can serve as a kind of “score” or road-map for examining our ideas on a more literal level.


Plan for our documentary

The goal of the documentary is briefly introducing our project and explaining how we execute it. The documentary would consist of the following parts:

–          The concept of audiovisual ensemble (that we are trying to demonstrate in this specific performance);

–          The narrative line (the allegory of the cave) and audiovisual metaphors;

–          Role of each group mates and introduction of their “instruments” (from both technology and aesthetic aspects);

–          Footages of rehearsal (to show some ensemble effects) and experience of the whole process;

–          Audience feedback (if possible).


And the contents could be organized as:


1. Concept:

Cross cutting clips from rehearsals and performance showing some audiovisual effects

Interview 1: introducing project concept

Full view of performing together, (interview continue as voice over), cross cut with discussion and setup.

2. Narrative:

Specific part of the performance typically related to the story of cave

Interview 2: allegory of the cave (keep it brief, the version we are using)

Picture (or simple animation if possible) showing the cave and prisoners (voice-over from interview)

3. Metaphor:

Interview 3: visual metaphor (shadow as illusion and video as realistic world);

With Detail of visual aspect (interview as voice over)

Interview 4: audio metaphor

With detail of playing instruments as well

4. Instruments:

(Possibly cross cut with the metaphor part)

Everyone playing his/her role in the rehearsal/performance, details of instruments (close-up shots), interaction with others, set up….

With interview about: your role, your instrument, your feelings……

(This section is for each person in the group. Although it seems quite short in this draft, this part is actually the main consistence of the whole documentary as it shows how and why we are doing these things.)

5. Feedback:

Shots of a prepared stage

Effects of the final performance

Feedback from audience

Feedback about the final performance from performers

6. Ending


This is only a draft for the documentary. I hope it would be helpful for us to prepare and collect relevant materials. If you feel confused about some of the points or have suggestions for improving it, please do not hesitate to tell me. J

Here are also some questions for you to prepare for. Please try to keep the answers brief because the length of documentary would only be around 5 minutes.

For everyone:

  1. What are you doing in the performance and how your outcomes related to some metaphors?
  2. Introduce your instrument.
  3. How you feel about doing this project?
  4. Which part of the performance do you like most?
  5. Whatever else you would like to talk about…

Beside, here are also some questions that should be talked about in the documentary but not necessarily discussed by everyone. I’ve attached my suggestion about who would talk about it. If you feel uncomfortable about it please just let me know and we could try some other arrangements.

Introduction to the concept of audiovisual ensemble (Russell?)

Allegory of the cave (Timo?)

How to achieve ensemble automatically (Marco?)

How to achieve ensemble manually (Shuman?)

Audio Visual Instrument/Control System – Colour Tracking


At the start point, my aim was to build a digital instrument that could tie the screen and the space of stage together so that audience would be able to  experience a audio visual theatre performance rather  than a real-time music video. To avoid the embarrassing situation that performers just stand in a dark stage operating computers without visible interact with the contents which they show to the audience, I attempt to develop a system that could smoothly embed the performer into the stage and make him/her an inseparable part of the whole performance. Inspired by the Electronic Theatre performance “Oedipus – The Code Breaker” in the Real Time Visuals Conference on 24th January 2014, I realized one way to connect the performer and the screen was to record the action of the performer on stage and add real time feedback into the video on the screen. Then I came out the idea of make a live video tracking system. This system capture the motion of objects that controlled by the performer or even the performer himself/herself on the stage in real time and update the data into programs generating sound and graphic as feedback. In this way such system could also be considered as an instrument.


From the jitter tutorial documents in Max/MSP, I found that one way for motion tracking is to follow the trace of colour. The object jit.findbounds provides us the function to find the position of visual elements in a specific range of colour from a video which could also be the real time video from a camera. Then it would output a set of data which could be used for manipulating or generating audio and video for output.

Here is a screen shot of the whole Mas patch:

Screen Shot 2014-02-27 at 9.40.40 PM

This patch consists of three sections: the colour tracking part, the graphic generating part and the sound generating part. The same set of data is sent into both audio and video sections at the same time to manipulate the parameters for different effects.

The colour tracking section could also be divided into three parts: video input, colour picker and position tracker. Video input allows data from different sources like cameras, web-cams or video files. The colour picker part allows settings from either directly click on the colour pad or the Suckah object that masked on the video preview window. The position tracker would find out the top-left point and the bottom-right point of that colour range and output them. In this patch I use some mathematical expression to transform the data that illustrating the centre of that colour and the size as well so that we could get the position more precisely.

The audio part of this patch is made by Russell Snyder, he built a audio adjusting section and a sound generating patch using the concept of river. Using this functions, the data of colour position is used to adjust the panning and volume of sound and at the same time mapped into sound generating. 

Previously I built a section that drawing rectangles using the colour position data, but it seems not that coherent with the sound effects. So I tried to find some other graphics. The Jitter Recipes show us some example of generating stunning visual effects. And I adapted one of the examples, Party Light by Andrew Benson in this patch to make the demo video.


Attached here is a demo video  experiments of audio visual effects using this colour tracking instrument:

Colour Tracking Experiments – AVE 2014 – submission 1 – jz

We have done three takes to exploring different possibilities under different settings of both audio and video. There is still some space for discovering new possibilities of this patch.


At this stage we did develop a effective motion tracking system by tracking the movement of colour. It could either be played on its own as an independent instrument or be combined with works of other group mates to develop some possibilities for the final performance. While using coloured objects in a bigger size, performers would be able to perform on the stage and at the same time tracked by the system. By this means it possible to combine the gestures and digital audiovisual effects together as a coherent performance.

However there is still something to be improved:

1. Sometimes the data of position still seems to flick which might cause some noise, I should improve the patch to smooth the changing of data.

2. The level of brightness could strongly influence the performance of colour tracking system. It seems to perform much better in brighter environment. As the stage of final performance would be pretty dark, I have to figure out a method to improve the performance of video recognition part in poor brightness.

3. Up till now the diversity in graphic aspect seems to be so limited, so more options for interaction would be added afterward.