Audio Visual Instrument/Control System – Colour Tracking


At the start point, my aim was to build a digital instrument that could tie the screen and the space of stage together so that audience would be able to  experience a audio visual theatre performance rather  than a real-time music video. To avoid the embarrassing situation that performers just stand in a dark stage operating computers without visible interact with the contents which they show to the audience, I attempt to develop a system that could smoothly embed the performer into the stage and make him/her an inseparable part of the whole performance. Inspired by the Electronic Theatre performance “Oedipus – The Code Breaker” in the Real Time Visuals Conference on 24th January 2014, I realized one way to connect the performer and the screen was to record the action of the performer on stage and add real time feedback into the video on the screen. Then I came out the idea of make a live video tracking system. This system capture the motion of objects that controlled by the performer or even the performer himself/herself on the stage in real time and update the data into programs generating sound and graphic as feedback. In this way such system could also be considered as an instrument.


From the jitter tutorial documents in Max/MSP, I found that one way for motion tracking is to follow the trace of colour. The object jit.findbounds provides us the function to find the position of visual elements in a specific range of colour from a video which could also be the real time video from a camera. Then it would output a set of data which could be used for manipulating or generating audio and video for output.

Here is a screen shot of the whole Mas patch:

Screen Shot 2014-02-27 at 9.40.40 PM

This patch consists of three sections: the colour tracking part, the graphic generating part and the sound generating part. The same set of data is sent into both audio and video sections at the same time to manipulate the parameters for different effects.

The colour tracking section could also be divided into three parts: video input, colour picker and position tracker. Video input allows data from different sources like cameras, web-cams or video files. The colour picker part allows settings from either directly click on the colour pad or the Suckah object that masked on the video preview window. The position tracker would find out the top-left point and the bottom-right point of that colour range and output them. In this patch I use some mathematical expression to transform the data that illustrating the centre of that colour and the size as well so that we could get the position more precisely.

The audio part of this patch is made by Russell Snyder, he built a audio adjusting section and a sound generating patch using the concept of river. Using this functions, the data of colour position is used to adjust the panning and volume of sound and at the same time mapped into sound generating. 

Previously I built a section that drawing rectangles using the colour position data, but it seems not that coherent with the sound effects. So I tried to find some other graphics. The Jitter Recipes show us some example of generating stunning visual effects. And I adapted one of the examples, Party Light by Andrew Benson in this patch to make the demo video.


Attached here is a demo video  experiments of audio visual effects using this colour tracking instrument:

Colour Tracking Experiments – AVE 2014 – submission 1 – jz

We have done three takes to exploring different possibilities under different settings of both audio and video. There is still some space for discovering new possibilities of this patch.


At this stage we did develop a effective motion tracking system by tracking the movement of colour. It could either be played on its own as an independent instrument or be combined with works of other group mates to develop some possibilities for the final performance. While using coloured objects in a bigger size, performers would be able to perform on the stage and at the same time tracked by the system. By this means it possible to combine the gestures and digital audiovisual effects together as a coherent performance.

However there is still something to be improved:

1. Sometimes the data of position still seems to flick which might cause some noise, I should improve the patch to smooth the changing of data.

2. The level of brightness could strongly influence the performance of colour tracking system. It seems to perform much better in brighter environment. As the stage of final performance would be pretty dark, I have to figure out a method to improve the performance of video recognition part in poor brightness.

3. Up till now the diversity in graphic aspect seems to be so limited, so more options for interaction would be added afterward.

From Audio/Vision to Transmedial Experience

Many different ways of approaching the idea of an Audio/Visual Ensemble are possible and it might be argued that vision and music were originally united in performative and ritual practices: for instance, archeological studies about prehistoric art deal with the same time scale in dating the birth of both painting and music. Moreover, many ritual practices around the globe involve this unity: a big fire, a circle of people playing musical instruments, other people dancing, clapping hands, singing along in colourful dresses, casting shadows around: all that is food for eyes and ears. Actually, also for nose and skin (think of the fire heat, of the contact with others’ skin and with the ground…). All that is no different from “occidental” disco clubs.

The way our ensemble approaches audio/visual projects directly comes from the idea of preserving this original unity: actually, we do not aim at recomposing it a-posteriori, but rather at making it the starting point from which to grow different fronds, all belonging to the audio/visual realm. Therefore, we understand the “/” sign as a continuum consisting of a range of experiences (including the strictly auditive and visual ones) that can be freely navigated. Free navigation, however, may well lead us out of the map: that is to say, other expressive forms may at a certain point be included in our projects, for example spoken words or performative arts. To be ready to greet them we should start to think of our group as a Transmedial Ensemble.

As a matter of facts, some of our projects already include performative elements. Timo’s instrument, for example, consists of system whose output massively involves light and sound; at the same time, though, it can only be appreciated considering its performance-driven nature. The source of it all is Timo playing a bass guitar – an action that, indeed, is one of the most ancient and well recognised ways of performing. Its performative nature becomes even more clear as Timo makes use of unconventional gestures. Russell and Jessamine have been working together on two different projects that go in an even more abstract direction, leaving behind the idea of playing a musical instrument and freeing the performative element. Marco’s system can be placed somewhere in between, as the physical interaction that “plays” it can vary from a percussive-instrument style to more “theatrical” gestures.

Submission 1: including analog

“Get away from the computer screen” one day Martin said, “for instance, you could make this TV flicker” he added. I had no idea how that could be done, but I had always been fascinated by small, old tv screens laying in a corner, maybe in a group of three or four, in exhibitions of different kind and atmosphere. Videoart, they call it. So I asked him, and he replied: “touch these contacts with an audio cable”. The next thing I know, I was in front of a screen (an analog one, though) that seemed to appreciate the wild noise I was feeding it in a very responsive way.
I then had the idea of putting together a small, local audio/visual unit, by placing a small speaker on the monitor. The idea is to have many of these units scattered around a performative space, maybe on the sides or behind the audience, and use them to create a counterpoint with the main stage. While I was experimenting with this setup, I discovered an interesting phenomenon: as I tapped on the surface of the speaker, a signal was generated that made the tv screen flash quite intensely. Actually the tv and the speaker receive exactly the same signal, which translates at the same time into light and sounds: this was achieved through a physical connection of the two, which connection eventually allowed the use of the speaker as a vibrational pick up (basically a bad dynamic mic). I ended up discarding this feature, but I took from that two different ideas: first of all, the use of contact mics to pick up physical interaction between a performer and an object (this is something I did before and I process the signal quite heavily); secondly, the idea of multidirectional flows of electrical signal, which eventually made me use an EM microphone (an electric guitar one) to pick up the EM field variations produced by the tv screen as it flickered. This signal is then processed in a very similar way to the contact mic one and fed back to the screens and the speakers, but it also enters a digital feedback system whose sound is then sent to main amplification.

A large part of the hacking happened in collaboration with Russell, while the guitar pickup belongs to Wolfgang Thomas.

AudioVisual Study: Silent Rhythm


We all have the experience that some pieces of music remind us of some particular scenes, we can imagine the picture in head without watching a real image. Then what about the opposite? Could we imagine the sound when watching silent videos? Also If you’re reading this sentence silently, chances are that you’re imagining a voice speaking the words you are reading. Watching videos could be viewed as another form of reading, where words are replaced by a series of images. It’s conceivable that sound could be formed in head even when we are watching silent videos.


We know that sound is a vibration that propagates as a mechanical wave of pressure and displacement, through some medium. Imaging that we are in outer space, we can feel the vibration while sound cannot be heard because no air as medium. While we cannot hear sound doesn’t necessarily mean we cannot feel it. Some findings suggest that the experience deaf people have when ‘feeling’ music is similar to the experience other people have when hearing music. The perception of the musical vibrations by the deaf is likely every bit as real as the equivalent sounds, since they are ultimately processed in the same part of the brain.

For this study some experimental video clips are made to investigate the correlation of auditory and visual senses. Obviously vibrating objects are of crucial importance in these  videos. I started by making 3D models in Blender, where the vibration effect is achieved   by baking sound to F-curve (one function in Blender). The sound source is <Pulse> by RTPN.

3D cubes

Here is the performance of these cubes. The sound is not exported.

This video is of strong rhythm, which is reflected by the fierce vibration, the changing sizes and colors of cubes. As expected I feel a sound of strong beats expressing excitement, or thrill in my head when watching this video.

I moved forwards to get some inspiration from real stuff in daily life. We already have a sound-assets library that is built through our life experience. We know what the sound should be like when waves lapping the shore, Ping-Pong’s bouncing on the floor, someone knocking the door…When these kinds of scenes appear on the screen we can imagine the sound even if the video itself is silent.

The following video is inspired by tunnel — the real tunnel and the time tunnel. It makes me feel that I’m in a car on the run while I can’t escape anyway. The imaginary sound is in a tremor because of fear. While this video is kind of abstract and audience may not feel any specific sound in head, which makes me realize that concrete objects like waves and explosion maybe better to present sound as well as produce audiovisual effect.

Depending on the former experimental videos, I decide to combine the 3D models and real scenes. As Cinema 4D files can be perfectly imported to After Effects CC, I chose C4D rather than Blender to make models such as vibrating balls, waves, smoke, etc..

屏幕快照 2014-02-27 下午9.46.51屏幕快照 2014-02-27 下午9.42.27屏幕快照 2014-02-27 下午9.40.00

Sound is filtered, whereas the balls in the video are of strong rhythm and scenes switch frequently, both of which create a feeling of intensity. Also you can imagine the sound when the huge ball falls to the ground and the floods submerge the screen. It’s interesting that the video is silent while you cannot feel any peace.

Software used: Cinema 4D, AfterEffects, Photoshop, Premire.

Plugins: Red Giant.



During this journey of “silent rhythm”, you must have experienced the feeling of sound-in-head. Sound is everywhere, sound keeps popping to head unconsciously even you are in quiet conditions, sound has specific beats when you are watching silent videos alive with rhythm. There are a thousand kinds of sound in a thousand audiences’ head.


Although I’m satisfied with the results of the videos, it has to be noted that the process is time consuming, including the 3D modeling, video editing and rendering. Especially considering the interactive requirements for our project’s final performance, the  operability and  conformability of the work is open to question. I need to think about more practical and efficient ways to present the idea and make the work fits a live performance environment. Regardless of the forms, the more important thing is the idea behind these videos — what is silent rhythm, and how silent rhythm can be “heard” and felt by people. I’m keen to seeing how this idea can cooperate with the other group members’ work.

Audio Visual Instrument- Bass Lamp


To develop sonic occurrences that communicate direct correlation to visual counterparts, perceptively inseparable in intent, and to acknowledge silence and the absence of visual stimuli as a necessary and effective contrast.

Gaining inspiration from Tim Ingold’s article “Against Soundscape,” and discussions with Martin Parker, I became fascinated with the idea of manipulating light, rather than image to see how this might prove compelling in is own right.

Tim Ingold observes, “It is of course to light, and not to vision, that sound should be compared. The fact however that sound is so often and apparently unproblematically compared to sight rather than light reveals much about our implicit assumptions regarding vision and hearing, which rest on the curious idea that the eyes are screens which let no light through, leaving us to reconstruct the world inside our heads, whereas the ears are holes in the skull which let the sound right in so that it can mingle with the soul.”

Even with our eyes closed it is still possible to perceive flashes, flickering or the presence of light and gain some indication that there is activity and movement. Light can be projected onto surfaces, broken by other objects, used to induce shadows, or add subtle touches begetting mood or ambience.

My Role

Construct a system that is highly responsive, expressively dynamic and diverse, which can be improvised in real time using a “traditional” instrument (standard bass guitar) to play electronic sounds that trigger and modulate specific DMX light movement in a way that directly connects sound and vision (light).  Submission one outlines my progress, observations, trials, tribulations, and aims to discuss plans for further development.

DMX setup

When I began this course, I had no previous experience with DMX and needed to conduct an extensive amount of research to over come numerous technical issues to get my system working. Step one was to get four VISAGE 0493 LED lights controlled remotely through Ableton Live running DMX Max for Live devices. Most of the preliminary documentation is explained in detail on my AVE blog.

DMX First Run

DMX Progress

DMX… a bit further


The easiest way to bridge connection from Live to the DMX LED lights was to send MIDI data out of an Audio/MIDI interface and into the school’s DMX console which converts MIDI to DMX.  After modifying Matthew Colling’s Max for Live patches to accommodate four lights, I was to some extent able to control them from within Ableton Live.

The easiest way is sometimes not necessarily the best way as the DMX lights performed sluggish and were very latent.  The lights would at times remain on when switched off and were unpredictable and difficult to control precisely. Additionally, they would flicker intermittently and pulse on and off on their own accord.  After speaking with Collings, he confirmed having the same issue, which he was not able to resolve .

Matt M4L DMX Devices

Although, I experienced limitations with only being able to control two channels with the DMAX devices via the Enttec DMX USB Pro, the setup was much more responsive, less latent, did not flicker, and handled more accurately.  Seeking perfection, I went back to trouble shooting the Enttec box and, after much tinkering, discovered that the issue was with Olaf Matthew’s Max/MSP external ‘dmxusbpro’.  I was able to overcome the channel limitations by using a beta abstraction by David Butler imp.dmx that focuses on jitter matrices to store, read and write data rather than reading straight MIDI values. Using the imp.dmx help file, I turned this into a 27 channel (four lights- each 7 channels) Max for Live Device.

T-Ø_DMX M4L Device Presentation

T-Ø impdmx Max

Up to this point, the Enttec setup has been more stable and the device functions somewhat as intended.  I did however need to limit the number of channels to 27 instead of 512 to accommodate a higher frame rate as to not overload the device when modulating large amounts of control data.

Audio Setup

The way in which a performer interacts in real time performance adds another dimension to the visual component.  I aim to hide my light emitting, distracting, computer from audience view and have toyed with the idea of performing behind a screen back lit by DMX lights (see video on DMX improv with shadows).   Although there is still much work to be done refining a setup that will allow me to do such, I have put together a working model that uses audio to MIDI conversion to control virtual instruments. The bass’s audio input can be additionally added as another voice, processed and manipulated in real time.


Ableton Live 9
PUSH Controller
Korg NanoKontrol
Roland EV-5 Foot controller
Max For Live
Bass Guitar
SoftStep Foot Controller
NI Virtual Instruments

Audio Setup

Instrument voicing:

CH 1- Bass Audio Input (Amp simulated bass sounds, Rhythmic Clicks/Beats, Distorted) 
CH 2- Sub Bass (Sustained or Arpeggiated)
Ch 3- Pads, Leads, Atmospheric Noise

T-Ø AVE instrument Live

Mapping Sound to Light

As a means to bridge visual and sonic events, I have experimented with different methods of mapping audio frequency to DMX control data.

EX 1- Three Lights, Three Voices, Three Colors

In the first example, using the DMX console setup, I’ve daisy-chained and mapped three different colored lights (Red, Blue, White). Each respond to a different audio source in Live, routed to envelope followers, that are mapped to control an individual light’s DMX values (0-255).

Rhythmic- Red > Light 1 (right)
Sub- Blue > Light 2 (middle)
Noise Lead – White > Light 3 (left)

Findings: Although this scenario might be interesting for a short period of time.  It did not communicate an expansive dynamic range of expression. However, projecting onto an object or wall might be worth further investigation.

EX 2- Screen, Lights, Proximity

Wanting to explore greater dimension and possibility, I brought in a huge 10 by 10 foot back projection screen borrowed from LTSTS (no easy feat to transport or assemble).


Due to it’s size, we were limited to conducting experiments in a bright noisy Atrium in Allison house. Below are two video examples of a 3 light setup without audible instrumentation.

Lights_Screen_No Sound_Close

Lights_Screen_No Sound_Distant

Findings: In this well lit environment, when the LEDs are off you can see a grey background caused by the screen itself.  A dark space is needed for this to be optimally effective.  Additionally, the effects of the lighting change with proximity.  It could prove interesting to stagger light distances at different stages of the performance.

EX 3- Giving Sounds Color and Movement

Using a Sony handy-cam we filmed in a lit Atrium. I am using two synced lights and the improved Enttec DMX Pro setup controlled by 3 different instrument voices. Each instrument voice is assigned a color.  The control is driven by individual audio output, linked to a corresponding envelope follower and mapped to DMX color values.

Angelic Pad- White
Rhythmic Bass – Red
Sub- Blue

EX 4- Combed Voicing, New Permutations

The following is an example of how the basic colors and voicing work in conjunction with one another generating new effects and color combinations but are still able to return to their original state (red, white, blue) when played individually.

EX 5- Adding Shadows

Improvisation combining and switching between voices and lighting control while experimenting with effects produced by shadows.

Findings: Using the Enttec DMX PRO and a projection screen yield a higher-quality dynamic range of expression. The lights are significantly more responsive but the setup still requires tweaking to generate a greater range for fade values. (i.e. contrasting quite to loud and to create a pulsing effect for pulsating sustained sounds).

Our camera distorts when recording sounds linked to the strobe parameter.  This phenomena creates an effect in itself and possibly could be captured, projected, and even fed back and looped, as it looks quite interesting.

Shadows created from behind the screen create an intriguing result and may be useful  to extend meaning, depth and character if carefully executed or choreographed.  Additionally, experimenting with placing myself with my bass guitar or another performer behind the screen might help tie together ideas for a more integrated and engaging audio visual instrument.

Critical Analysis and Moving Forward:

Although these experiments show progress and potential, there is still much work to be done on both the audio and visual fronts.  At this stage, as is and on its  accord, I do not envision my instrument being dynamic or compelling enough to sustain meaning and interest for extended periods of time.  As my group has been working on experiments individually, it’s been difficult to directly access how this will function as part of a larger performance.  Moving forward, I aim to develop my instrument further in its own right as well as work towards integrating it as a subsection of the ensemble.

Plans for further development:

Dial in specific sounds, effects and performance techniques that optimize sound generation and DMX feedback that work well standalone as well as with the rest of the ensemble.

The audio to MIDI conversion needs to be refined.  Tracking bass frequency is no easy task and I often get unexpected and false triggered notes.

Set up foot controllers to aid in modulating sonic and visual elements. Up to this point, these have not been implemented.

Tweak and scale modulation sources and envelope followers to be more dynamic with mode, strobe and fade values.  Develop precise control mechanisms that will allow for better ways of expressing relationships between sound and silence.

Investigate incorporating a dimmer pack and setting up additional lighting sources (lamps) that can be placed around the stage or in the audience. One idea includes switching on/off various audio effect processing or changing instrument voicing to trigger corresponding light states (on, of, dim, bright, flicker).

Work as a group to quickly identify a unifying theme.  Create a map of how our performance will move throughout time and devise how we can keep it engaging throughout its duration. Schedule regular group rehearsals in an effort to better understand how we operate as a dynamic and cohesive unit.

Audio-Visual Instrument: Lissajous Beat Organ

Study:  Color, Sound, Light, and Lissajous Figures


My goal was to create an instrument that artistically actualizes the phenomena of phase interference which can be heard as auditory beats and seen as Lissajous curves.  I initially added color in a way historically described by one of my favorite composers, Alexander Scriabin. After experimentation, however, I found a color and shape palette that is more true to my own musical-visual experiences.

Because of our group’s discussions about the importance of a final performance that is more interactive than passive, I decided to use controllers to modify audio and visual components in real time.  In this version, a gametrak (piloted by Jessmine XinYan Zhang) is being used to control the camera angles and object rotation (based on a patch from, and a knock-off “Air-Flow” PS3 controller (bought for 2 pounds from a Car-Boot sale in the Omni-Center in Edinburgh) is used to control sonic elements and visual pallettes.

Audio-Visual Interpretation

Here is the screen capture and audio from the computer from the February 26, 2014 performance:

Overall I am pleased with the sound and visuals interaction.  The live performance aspects, however, have room to improve upon.  Here is the live performance from the Alison House Atrium from February 26, 2014:

Overall, the sound/visual interaction was successful, but the performers are too dark to be seen!  It’s nearly impossible from this video to see the intimate interactions between performers, sounds, and visuals.  For a next performance, better lighting will be used.

Technical Information

After a search for a way to create Lissajous curves on the C74 blog (Max/MSP website), I found an efficient way to render a 2 dimensional figure using Jitter and OpenGL.  After carefully studying a patch from Oli Larken, I managed to make this shape 3 dimensional with the Z dimension and brightness being modified from live sound input.  This is my hacked version, not in presentation mode:


You can see the gametrak patch mentioned early in the top left corner.

Some data mapping was necessary to allow for an aesthetically pleasing sound and visual interaction, but most of these mapping decisions were based on psycho-acoustic boundaries.  For example, Lissajous figures look most interesting (to me) when the frequency drawing them is under 20 Hz, but human hearing only starts at 20 Hz.  In addition to this perceptual consideration, I noted that Lissajous figures become fabric like as the two sine waves generating their figure separate by more than about 5-10 Hz (depending on initial frequency).  But with frequency separation this great, human hearing segregates these sounds into separate tones instead of a single timbre.  I mapped data accordingly so as to not detract from either sonic or visual aesthetic of the instrument.

As mentioned above, the sonic and visual components are based on the phenomena of phase cancellations that occur between two slightly out-of-tune sine waves.  There are three sound generators in this instrument, each which control the visuals via shared data input or mapped envelope followers.  This is my performance patch:


Here is the patch that is processing the audio:


You can see the PS3 controller patch that I am using at top of the screen.

The only extra sound effect used in this version (other than the exploitation of phase cancellation in sine waves triggering low-pass, high-resonant filters in a rhythmic way) is reverb.  Specifically I am using a series of Max/MSP externals called HISSTools.  This external allows me to take incoming sound and convolve it through multiple reverbs.  Although many impulses are loaded, the main impulse response that I am using is one that I personally recorded at the University of Edinburgh pool in the fall.


Final Notes

By using controllers, the Lissajous Beat Organ takes on an exciting life of its own!  Even though I sometimes forget my own controls, the way in which the PS3 controller is mapped allows for new sounds to be created from a wide array of gestures.  The sonic and visual textures that can be created in real time would be nearly impossible to achieve with only a mouse or track pad.

For future versions, controls will be mapped without using global send and receive objects.  In this way, I will have more abilities to change my sounds and visuals throughout the piece.  In addition to this, as rehearsals take place, the sonic and visual content of this instrument will be modified to meet the needs of the group as a whole.  Perhaps the gametrak will control sonic element or audio generated from another performer and modified in my instrument.  I look forward to seeing how this instrument will work to create a dialogue and eventual performance with the other members of the Audio-Visual Ensemble in the coming weeks!