Submission 2: Mockingbird

[soundcloud url=”″ params=”” width=” 100%” height=”99″ iframe=”true” /]



Mockingbird is a composition that was developed by five masters students from Sound Design, Composition for Screen, and Digital Composition and Performance between February and April 2013; the piece was a realisation from insight gained from a previous project that evolved around Christian Wolff’s For 1, 2, or 3 People (1964). This documentation is designed to explain the concepts and technology involved in the execution of the piece. The project developed from a pool of interest and was influenced by works across different styles. This page will be looking into the project’s major components such as the visual and the aural score, as well as the technical aspect of the piece, including the real-time digital signal processing system.


Visual Score and Aural Score

Aural Score:

The aural score for Mockingbird developed and morphed substantially from its inception. As a starting point field recordings took place in order to obtain material that could be used to compose a soundscape; a form of electroacoustic music that often consists of field recordings of environmental sound, biophony, as well as other pre-recorded sounds. This field recording was carried out keeping in mind the influence soundwalking, a form of excursion that focuses the listener on the sound of the surrounding environment. As Hildegard Westerkamp mentioned in the article ‘Soundwalking from Autumn Leaves, Sound and the Environment in Artistic Practice’:

“It is exposing our ears to every sound around us no matter where we are… Wherever we go we will give our ears priority. They have been neglected by us for a long time and, as a result, we have done little to develop an acoustic environment of good quality.” (Westerkamp, 2007, 49)

Field recordings were made both indoors and outdoors, and were subsequently processed, and composed in order to make an aural score. To create a sonic excursion experience, all the field recordings that were recorded onsite on actual soundwalks taken by memebers of Action | Sound. The recordings were in some instances made with pair of DPA 4061 Omni Miniatures, to allow the possibility of recreating the experience through binaural mixing. The decision was yet to be made on the format of audio playback for the performers during the performance at the time. However, the project ended up not applying the binaural mixing, due to the soundscape becoming part of the performance to be played back into the auditorium. Other than DPA 4061 Omni Miniatures, Zoom H4n was also used during the field recording, to allow more editing options during the process of creating the soundscape, Artist such as Peter Cusack, Andra McCartney, and Westerkamp herself, who are known for their works in the format of soundwalk influenced how the Mockingbird’s soundscape was captured in its raw form. The project took an interest on Soundwalking Interation’s Résonances de la Fontaine (2012); an improvisation project that was led by Andra McCartney, which evolved around an instructed soundwalk in Parc Lafontaine, Montreal, that leads to an instrumental performance that was inspired by each participant’s experience of the excursion in the park.

Upon reflection it was decided that the best course of action was to create an aural score that was a blend of the field recording we had done along with processed versions of the field recording and more gestural sound design. This required a lot of post-production and editing.  Space needed to be created for the vocal utterances to take place. The composition of the aural score involved reacting to performances we did on a day-today basis, cutting elements out, and accentuating others. It also became apparent that we did want the performance to be completely centered on the performers, but that one element would feed off another, creating an eco-system of sound in the room, for both the audience and the performers to experience. We found this trade off worked well in hinting at a soundwalk experience, without distracting too much from the performance itself. The main goal of the score was to have something for the performers to react against, but something that the audience could also identify with. This would further disseminate the distance between performer and audience. Collective experience was a major consideration in this project, at one point the idea of even having an audience present was up for debate, though it was conceded that should it be required, audience engagement could take place. The field-recording elements of the score facilitate something for the audience to hold onto, a prompt, while maintaining something the performers could engage with also. The aural score was in a sense a bridge between audience and performer.

Visual Score

Following on from our experiences exploring the graphic notation/instruction system used by Wolff in his work – For one two or three people, we decided to pursue a more innate and less literal system for communicating performance instructions.

One area that we felt limited the efficiency of Wolff’s system was it’s reliance on extensive written performance instructions that are referenced by graphic symbols. This required considerable time and effort on the part of performers to learn these complex instructions and then their cues within the score.

While we did not see this time as wasted, we felt that there was a noticeable lag in responding to the cues and felt that this was due to both cognition and transliteration processes.

We aimed to develop a compositional system that combined a simplified graphic score element, in this case bars of seven colours (depicting seven discrete performance instructions – Compliment, Repetition, Interruption, Dynamics of Speech, Mouth [unvoiced] and imitation) and three bar heights depicting three amplitude levels and bar length depicting performance time. This score would be communicated both by the printed image and by the coloured LED lights.

The complexity of the performance instructions is further increased by their interaction with six vocal technique instructions (vowel, consonant, guttural, whisper, humming and breathing) and by their directed interaction with the aural score (a guttural imitation of the audio score at a specific point, for example).

Technical Specification.

The technical requirements for the light cue system were;

– Stable enough to be used in a live performance setting.

– Clearly visible to performers under different lighting situations.

– Easily portable.

– Able to produce both the light sequence itself and a printable version of the graphic score.

– Easily networked and capable of communication with other score elements ie fixed media control and DSP processing of vocals.


I decided that a system based around an Arduino platform combined with six multicoloured super bright LEDs would meet the specifications and combined with the Maxuino software would interface very well with Max/MSP.

LED selection.

Typically common cathode RGD LEDs are used with Auduino boards as these provide a simple method of producing a very wide range of colours. However Common cathode LEDs would require three PWM (pulse width modulation) outputs for each LED. As I needed to control six LEDs from a single Arduino this would have required at least eighteen PWM outputs and with amplitude (brightness) control, twenty four.

The current highest spec Arduino format (the Mega) has a total of 16 PWM outputs. Previous experience has shown that using multiple Arduino systems with Maxuino is highly problematic and this would have exceeded the project budget.

After some experimention, I opted for common anode RGB LEDs. By simply switching combinations of the colour inputs to ground and sending a PWM (variable voltage) to the common anode, I could produce seven discrete colours of variable luminescence while only using a single PWM output for each LED.

Software system.

The software section of the light system (Action Sound Conductor – asConductor) was built using Max/MSP and the Maxuino, Arduino interface software. The specification for the build was;

– Must be built around Max/MSP to allow interfacing/networking with other components of the system.

– Must allow easy editing of performance structure.

– Must simultaneously produce printable graphic score.


The screenshot above shows as_conductor configured for five channels, however the system was designed to be six channel. The upper left hand section shows counters for bars and beats, minutes and seconds and a global decimal second count.

The three central control buttons are; Play to start the performance timeline (this sends a control start message via the network to the other two laptops/Max/MSP systems), Reset (this resets the system) and Score (this outputs the graphic score via the Max LCD object in .PNG format.

Score as output by Max/MSP

The light instructions are processed by the Max object as_vox. as_vox requires four arguments; colour, start time (seconds) end time (seconds) and brightness (low, medium or high). In the example below this instance of as_vox is creating a cyan light of low brightness, starting at 10 seconds and ending at 20.

Tech Developement

Documentation_on sound effecting system

# On the issues of sound manipulation

1. The condition for sound manipulation

In this performance, it is planned that the light signs of different colors are transmitted to all the performers at different timeline points. And each performer is required to react differently to each color on the timeline progress. So, sound manipulation should be considered to support the performer’s reaction as well as the whole structure depending on the time progress.

Under this condition, I made sound effect system flexible in that the voice from each performer can control the parameters of effectors, and fixed in that the color sign chooses the parameters of effectors. In addition, for a further dynamic result, some other conditions are taken into consideration.

2. The method of using human voice as a controller (using Max/MSP software)

All the performers can control the parameters of sound effectors by using their own voice. I focused on the potential of the pitch and amplitude information from human voice. These two types of information can be detected easily into computer software, Max/MSP. Of many different objects analyzing pitch and amplitude, I chose ‘fzero’ object, which has a multi-function on the threshold of pitch and amplitude range setting even though it is not suitable for polyphony, which doesn’t matter because human voice is monophony at one time. With this object used, each performer can control the level of effector mixture amount and pan position, which is explained as below.

And, due to the fact that we are not trained vocalists and have to control the level of sound effectors only by using our voice without any midi-controller, certain kinds of effector level are stored as forms of preset. The preset is supposed to be changed according to the color sign from ‘Conductor’ laptop, which is coming in through Ethernet wire cable.

Therefore, all the effector system is operated through the message from ‘Conductor’ laptop and the information of pitch and amplitude from performer’s voice.

3. Sound manipulation systems

All the sound effectors and operating systems are made with Max/MSP software.

1)  Panning system (with the pitch of voice)

The pitch of performer’s voice controls the panning system. The panning position of each performer is fixed in the category of ‘Left’, ‘Right’, ‘Center’, ‘Central Left’ and ‘Central right’. But, when each pan position is changed depending on the pitch from each performer. With the higher pitch from performer, the pan position of ‘Left’ is moved to the position of ‘Right’, the pan position of ‘Right’ is moved to the position of ‘Left’, the pan position of ‘Central Left’ is moved to the position of ‘Right’, the pan position of ‘Central Left’ is moved to the position of ‘Left’, and the pan position of ‘Center’ is moved to left and right around center. Through this panning system, the performers can make more dynamic and more spectacular live performance. Furthermore, a kind of ‘call and response’ way of performing is enabled. For instance, the performer who has the pan position of ‘Right’ can move his or her sound pan position towards the performer who has the pan position of ‘Left’ with more aggressive and higher pitch sound, then the performer in the ‘Left’ pan position can react to it by making his or her pan position moved from ‘Left’ to ‘Right’, while the performer of ‘Center’ position keeps the center pan position, which could produce a more dynamic and balanced result.


2) Effector mixture controlling system (with the amplitude of voice)

The amplitude of performer’s voice controls the level of mixture with effectors. This is based on the condition that if he or she wants to make the voice more expressive, he or she usually makes the voice louder and louder. So, with higher amplitude of voice, the effectors allocated to each performer make the original sound more transformed and affected by effectors. For instance, a very high pitch of voice can make the original sound totally distorted, pitch-shifted with higher pitch gap between original sound and manipulated sound, or more granulized.

In addition, the amount and range of sound effect mixture can be controlled manually as well. This is in case that untrained vocal performers can have trouble in making more dynamic range of voice amplitude. So, a performer who doesn’t have wide range of voice amplitude can produce an effected sound within his or her range in advance.


3) Effector trigger and preset system (with the message from ‘Conductor’)

– The types of sound effectors

Each performer has each unique sound effector while all the performers have in common ‘Equalizer’, ‘Delay’, and ‘Reverb’ effectors. Of the unique sound effectors, there are ‘Waveshaping distortion’, ‘Flanger’, ‘Pitch-shifter’, ‘Granular’, and ‘Freezer’. The ‘Waveshaping distortion’ and ‘Granular’ effectors have a role to make sound more rough and squashed, the ‘Flanger’ and ‘Pitch-shifter’ have a role to produce more dynamic range of frequency with modulation and variable pitches, and the ‘freezer’ effector is combined with ‘auto-pan’ effector and another ‘pitch-shifter’ which is operated automatically and independently. So, the ‘freezed’ sound from a performer can be reproduced dynamically which moves from left and right with exponentially high or low speed, and differently variable pitches. And the ‘Limiter’ is hooked before the output, so that the clipping fault as well as the whole level of amplitude can be controlled.

– Trigger and preset system

All the sound effectors are equipped with preset functions, and triggered when certain messages are transmitted from ‘Conductor’ laptop. In the message, there is information of color, saying like ‘blu’ in the blue color sign. Then, the information of ‘blu’ triggers a certain combination of effector preset. All the parameters of effectors are stored as a form of preset into a text file using ‘coll’ object in the patch, which means that the parameters of preset can be modified at any when and each performer’s opinion can be reflected easily in advance.

Each performer has different preset settings and even any same color message doesn’t trigger the same effector preset. All the presets are made in the way of giving different levels of effector according to the whole structure progress of graphic score from ‘Conductor’ laptop. For instance, if the ‘Conductor’ sends ‘blue’ color three times during the whole structure, the different settings of effector preset are triggered differently each time in the consideration of the whole progress. So, any performer can’t experience same effector preset during the whole performance. To modify the preset settings, performers can type different preset numbers through the ‘coll’ data. The preset of each effector can be modified by doing ‘shift+click’ on each preset buttons after manipulating each parameter.

4) Real-time recording and playback system (with the message from ‘Conductor’)

For sound sources of ambient sound effect, another effector is added. The sound from performers during live performance is recorded in real-time into 6 different buffers the duration of which is 5 seconds for each sample. When played back, the sound samples are manipulated with exponentially variable pitches, speed and amplitude through the auto-panning system and another reverb effector, which is different from the reverb effector which is used for each performer, because the reverb effector has a function of sound diffusion parameter, so that the characteristic of ambient sound can be made.

All the recorded sound samples are played back in two ways, the first one is the way of the samples being played one after another with slightly different playback speed, while the second one is the way of the samples being randomly selected with pitches exponentially going up and down in the way of a polyphony pitch-instrument.

Lastly, this system is operated in the time-based condition, which can be set by using text file with ‘coll’ object in Max/MSP. So, the time information in the messages from ‘Conductor’ laptop triggers the function of real-time recording and playback system.



Documentation on The audio playback system

This Max/Msp patch is the audio playback and manipulating system that could be variable and convenience tool for rehearsal stage and composition. It could be helpful for performers to experiment with the new ideas of composition and performance. Basically, the system consists of the audio playback system and the condition system.

1. Audio playback system

Primarily, this patch is made for playback the composed audio score. Each track is triggered by the set time. Its core components are the buffer~ and groove~ objects, which allows users to manipulate the playback speed and volume in real-time. By dropping an audio file into the drop file space, users would be able to see the audio file name, its wave from and duration time in millisecond and minute.


The audio playback system is made by 3 main objects; Groove~, sig~ and Buffer~(image 02. The sig~ object will receive the value number from condition system and convert it into the audio signal which is used as the playback speed value.



2. The condition system

Time condition system

It consists of pitch, amplitude and cue playback system (image_04).

The purpose of the system is to prepare, shape and manipulate the audio file. It works on the same timeline system as the conductor system so that we can set the times points (millisecond) for playing the audio cue, and also we can automate its volume and playback speed in real-time at any specific time points.

Especially during the rehearsal when we still can re-arrange our performance score, so that we can easily manipulate and mix the audio score with the performing voices such as whispering and humming part.


The pitch and amplitude condition system are made by select~ object that receive time from the conductor system so that it will bang and send the pitch, amplitude and speed value to the line object which is sent to the playback system (image_05).


The cue playback system is made by zmap~ object. It waits for set time for triggering the playback cue then it will send the number of playback cue to the select~ object to select audio cue number (audio track).



Tech Rider:

2 X Genelecs 1031A

1 X Mackie Onyx

1 x Fireface

1 x 8 Pre Amp DAV

3 x RE20

2 x Sennheiser MD421

EV PA + Sub

5 x Mic Stands

5 x XLR to XLR

16 x 16 Jack Loom

4 x Ethernet cables

1 x Sound Devices

2 X DPA omnis

For Mockingbird, the 2013 project of the action sound group, several considerations were made in order to produce what deemed to be the most effective result and those shall be analysed in the following lines.

When undertaking a collaborative project, there are obvious dynamics involved that will effect the outcome of the work; artistic differences, schedule clashes, hierarchy of knowledge etc. However when working in what can be a stressful process, it is necessary to think about what are the positive aspects gained when working with others as part of a team. Firstly the fact that you are working with people in the flesh, not through computer networks, but face-to-face, which is proven to improve the quality of work in collaboration.[1] In retrospect, the moments when our project suffered were when we tried to mediate through email, social network etc. It is when we worked with each other in a space that feedback dynamics were harnessed, opinions were instant, and development consistent. The feedback dynamic of group work is something which was key to the development of this project, as direct knowledge and interactional dynamics are proven to be beneficial in productivity.[2] In an age where technology buttresses so many aspects of creative practice, certain measures were taken in our group to make sure that technology was an aid, not a necessity. Returning to the point of not being dependant on digital media, we determined from our first submission that vocal performance had yielded the best results, in that it was something that we were all comfortable in using freely and also because it levelled the playing field for each performer. For the Mockingbird, different vocal tools were determined through trial and error. We finalised on the use of whisper, guttural sound, vowel, humming, breathing and consonant, as these seemed to work best in both complimentary and antagonistic terms. These tools were then to be controlled by the parameters compliment, repetition, interrupt, speech dynamics, mouth movement and imitate. These would interact at time, but at other time allow space for one performer to challenge another. The key influence from the previous performance of Christian Wolff’s For 1, 2 or 3 people is that an open work does not mean ‘anything goes’ – we allow for the openness of performance, there are parameters there that suggest a style of utterance to the performer, though the performer is very open n how they can react to those stimuli. No two performances of Mockingbird were ever alike, and everyone was as important in its development as the last, and will continue to be.




Computer-Mediated Instruction: A Comparison of Online and Face-to-Face Collaboration Author(s): Jeremy I. Tutty and James D. Klein, Source: Educational Technology Research and Development, Vol. 56, No. 2 (Apr., 2008), pp.101-124

The Mutual Knowledge Problem and Its Consequences for Dispersed Collaboration Author(s): Catherine Durnell Cramton, Source: Organization Science, Vol. 12, No. 3 (May – Jun., 2001), pp. 346-371

Westerkamp, H (2007) Autumn Leaves, Sound and the Environment in Artistic Practice, Ed. Angus Carlyle, Double Entendre, Paris, p. 49.

Dave, M (2012) Résonances de la Fontaine, weblog post, 27, May, accessed 22 April 2013, <;.


[1] Tutti/Klein, 2008, p. 101.

[2] Durnell Krampton, 2001, p. 347.