Sound of
Shanghai:

The Art and Technology of

At the heart of this research stands «Sound of Shanghai: Opera of a Cosmopolis» — a synthesis of art, acoustics, and technology that redefines the experience of spatial sound.

Sound of Shanghai: The Art and Technology of Spatial Audio Reproduction.

By Iverson Zhende Xu
Key Laboratory of Artificial Intelligence Music Therapy, Shanghai Conservatory of Music
Full Dimension Multimedia Technology (Shanghai) CO., Ltd.

Abstract

Taking «Sound of Shanghai: Opera of a Cosmopolis» as a case study, this article explores the project’s technical and artistic dimensions from the perspective of a spatial audio recording engineer.

It first introduces the composer and the contextual significance of his work, followed by a brief analysis of the three compositions that define the project. The article then elaborates on the acoustic characteristics of the recording venue, detailing its planned renovations and the integration of Wave Field Synthesis (WFS) spatial acoustic enhancement technology to optimize spatial audio recording.

Building on this foundation, it examines the microphone design and recording strategies tailored for each track, ensuring precise spatial sound capture. Finally, the study investigates object-based WFS spatial audio production, exploring its potential applications in cinematic sound presentation, Full Dimension spatial audio systems, and customized sound installations.

By advancing immersive auditory experiences, this research highlights the intersection of music, technology, and spatial design, shaping the future of spatial audio production and artistic expression.


Keywords: Sound of Shanghai, spatial audio, recording art, WFS (Wave Field Synthesis)

Contents.

From Musical Vision
to Spatial Audio Realization
01.

Introduction

Frames the project as an East–West fusion and a spatial-audio exploration.

02.

Composer & Work

Positions MJ Elia and «Sound of Shanghai» as culturally integrative and Shanghai-focused.

03.

Pieces (Overview)

Three works blending tradition with contemporary language and orchestration.

04.

Recording Venue and WFS

Rehearsal hall acoustically renovated; WFS system creates consistent, controllable space.

05.

Sound Pickup Design

Purpose-built mic schemes (overheads, PZM, spots; VR1/VR2/RN17/RNT) for precise spatial capture.

06.

Spatial Audio Production

3D mic data + object-based workflow; «spatial folding» for immersive XYZ rendering.

07.

Conclusion

Art–tech–heritage synergy; advances China’s spatial-audio practice and global dialogue.

Technical Heading

Acoustic Architecture
& Spatial Control System.

Integration of distributed loudspeaker topology, multi-point microphone arrays, and AI-assisted object-based spatial reproduction.

0
loudspeakers

in the Wave Field Synthesis environment

Wall-mounted
modules
40
Dual-layer
ceiling arrays

10
Suspended 3D
overhead microphones

8
PZM ground
capture points

3
0
Dolby Atmos (9·1·6 mix configuration)

object-based immersive reproduction

IMF (IOSONO)
pipeline
IMF
AI-driven
field controller
AI
Real-time
spatial rendering
RT
Wave mapping
precision
XYZ

Sound of Shanghai

The Art and Technology of Spatial Audio Reproduction.

01.

Introduction

In today’s world of increasingly frequent cultural exchanges, music, as a universal language, plays a vital role in conveying emotions and bridging cultures. Composer Marios Joannou Elia, known for his profound academic and artistic achievements and international influence, has created a series of remarkable works that seamlessly blend Eastern and Western traditions. «Sound of Shanghai: Opera of a Cosmopolis» stands as one of his representative pieces. This project is not only a tribute to Shanghai as a global metropolis but also a bold exploration of the fusion between traditional Chinese music and contemporary musical language.

To bring the full musical depth of this work to life for audiences, the Artificial Intelligence Music Therapy Key Laboratory of the Shanghai Conservatory of Music played a pivotal role as a key partner. The recording and production process was executed by the Full Dimension Multimedia engineering design team, a university–industry collaboration. This partnership initiated a journey of both technical and artistic exploration in spatial audio recording. From Iverson Xu‘s perspective as the recording engineer, this paper delves into the technical intricacies and behind-the-scenes stories of the project’s production, with a particular emphasis on his insights and contributions throughout the spatial audio process.

A selection of sE Electronics microphones used during the recordings,
including sE8, VR1, VR2, V7, RN17, and RNT models.

02.

Composer and Work:
Cultural Integration in the Context of the Times

Marios Joannou Elia is a composer of international acclaim, whose works span symphonies, operas, chamber music, along with other genres. His musical creations are distinguished not only by their profound academic depth but also by their engagement with social realities. Elia’s music often integrates elements from diverse cultural backgrounds, offering a unique perspective that crafts an entirely new musical language.

«Sound of Shanghai: Opera of a Cosmopolis» is one of his recent works. Centered on themes of futurism and the urban culture of Shanghai, the project aims to capture the essence of the city through music. It seamlessly integrates elements of traditional Chinese music with contemporary compositional techniques, embodying the dynamic fusion of Eastern and Western cultures. This work is not only artistically significant but also socially relevant, highlighting Shanghai’s rapid economic and cultural development. It reflects China’s growing cultural confidence and innovative spirit in the context of an increasingly interconnected world.

p1
The cover of the score for large Chinese orchestra and choir«Welcome to the Future!» (2024).
p2
The instrumentation list for the large Chinese orchestra — «Welcome to the Future!».
03.

Characteristics of the Pieces:
The Interweaving of Tradition and Modernity

«Sound of Shanghai: Opera of a Cosmopolis» features three distinct pieces, each performed by different ensembles: the National Chinese Orchestra and the Jinqi Ruan Orchestra, both conducted by Professor Wu Qiang from the Shanghai Conservatory of Music; and the Guzheng Orchestra, which incorporates a female choir and sheng, led by the renowned guzheng performer Professor Qi Yao.

3.1. «Welcome to the Future!» — The Sound of the Future for Large Chinese Traditional Orchestra

1) «Welcome to the Future!»
From «Sound of Shanghai: Opera of a Cosmopolis», «Welcome to the Future!» is a piece originally composed for a large traditional Chinese orchestra and choir, with a duration of approximately 5 min. 45 sec.

2) Instrumentation Details:

  • Chinese Traditional Instruments: bangdi, qudi, xindi, high/medium/low sheng, suona (high/medium/low), yangqin, guzheng, liuqin, pipa, zhongruan, daruan, gaohu, erhu (I, II), zhonghu, timpani, bass drum, cymbals, woodblocks, three-tone gongs, various Chinese percussion, etc.
  • Western Instruments: harp, double bass, cello, vibraphone, etc.

3) Musical Style and Features:

  • Cultural integration: incorporates many traditional Chinese instruments while blending modern techniques such as glissando, double-tonguing, and tremolo; the melody draws on the pentatonic scale and interweaves Western harmonies and acoustic effects.
  • Urban & futuristic feel: rapid rhythms, complex textures, and layered instrumentation evoke a bustling, technologically driven metropolis.
  • Structure & dynamics: unfolds in multiple sections with clear dynamic contrasts, moving from soft dolce to powerful forte, with alternating instrumental groups creating vivid antiphonal echoes.
Beginning of «Welcome to the Future!»

The section opens with a tutti performance by the entire orchestra, creating a grand and majestic effect. High-register instruments like the bangdi, soprano sheng, and suona intertwine to produce bright, penetrating sonorities, complemented by the rhythmic decorations of the yangqin and pipa, evoking a distinct sense of futurism.

  • Rhythm & melody: rapid syncopated patterns combine with irregular interval leaps (fourths, fifths), infusing the line with tension that suggests technological drive and a touch of uncertainty.
  • Harmony & sound: traditional pentatonic materials (gong, shang, jue, zhi, yu) are interwoven with modern harmonic devices; stacked thirds and fourths generate contemporary timbral mass.
  • Performance character: a futuristic soundscape emerges as the idioms of traditional instruments are reimagined, softening boundaries between heritage and modernity to create something at once familiar and new.

Characteristics & details:

  • Dialogue of bangdi & soprano sheng: the bangdi’s swift upward scales contrast with the sheng’s sustained tones, symbolizing the interplay between rapid urban momentum and enduring traditions.
  • Yangqin ornamentation: tremolo and arpeggio figures emulate synthesizer-like shimmer from contemporary electronic music, heightening the work’s futuristic profile.
Excerpt from the score — «Welcome to the Future!» for large Chinese orchestra and choir (p. 3).
The Mid-section Theme «Innovation and Pioneering»

The mid-section introduces the bass sheng, zhonghu, and daruan, gradually incorporating the gaohu, erhu, and the choir’s semi-vocal gestures (humming, breathy sounds). This layered orchestration conveys innovation—moving from simplicity to complexity, from silence to emergence.

  • Rhythm & dynamics: an initially free, spacious tempo accelerates into irresistible momentum, mirroring technology-driven social change; a crescendo from pp to f heightens tension.
  • Melody & harmony: glissandi and subtle vibrato suggest sonic exploration; layered orchestral writing enriches the harmonic field.

Characteristics & details:

  • Glissando with vibrato: glissando alludes to traditional operatic inflection, while vibrato adds a contemporary sheen—merging operatic lineage with futurism.
  • Semi-vocal choir: textless humming and breathy colors simulate mechanized ambience, reflecting the coexistence of technology and humanity; an alternative, texted version also exists.
  • Bass sheng ↔ zhonghu dialogue: sustained bass-sheng tones offset the zhonghu’s terse figures, emblematic of tradition meeting modern impetus.
Finale of «Collective Wisdom and Harmony»

The finale summons Chinese bass drum, cymbals, and full orchestra to forge a solemn, majestic atmosphere—an image of strength through shared knowledge.

  • Rhythm & melody: uniform triplet figures drive the close; the melody returns to the pentatonic frame while modern harmonies infuse fresh energy.
  • Performance profile: the coda crystallizes the work’s themes—Shanghai as diverse, inclusive, and innovative, with progress anchored in cultural exchange.

Characteristics & details:

  • Return to pentatonic: the closing gesture re-centers the traditional scale, affirming that innovation flourishes without severing its cultural roots.

3.2. «Where Dreams Begin!» — A Dreamlike Fusion of Soprano and Ruan Ensemble

1) Overview
«Where Dreams Begin!» is a composition for soprano and ruan ensemble, lasting approximately 4 min. 30 sec. It is performed by the Jinqi Ruan Orchestra of the Shanghai Conservatory of Music under the direction of Professor Wu Qiang.

2) Instrumentation & Structure

  • Core instruments: ruan family including liuqin, zhongruan, and daruan.
  • Soprano role: may be presented solo or with light support to spotlight the primary melodic line.
  • Form: sectional design with evolving dynamics and affect; selective passages permit free repetition, optional percussion, and experimental effects to suggest improvisatory flow.

3) Musical features

  • Melody & harmony: the soprano traces supple, cantabile lines focused on expression; the ensemble enriches with glissando and vibrato, shaping a fluid, dreamlike aura.
  • Rhythm & meter: flexible pacing with passages of freer interpretation; contrast between brisk and lento sections heightens rhythmic profile.
  • Timbre & technique: plucked articulation, glissandi, and harmonics highlight the palette; dialog between low registers (e.g., daruan) and high voices (e.g., liuqin) adds depth and resonance.

3.3. «Where Dreams Begin!» — Guzheng & Choir:
A Modern Interpretation of Traditional Instruments

1) Overview An alternate version of «Where Dreams Begin!» is written for guzheng ensemble with female choir and sheng. Around 20 guzheng players perform together with choral chant; the choir often carries the principal melodic thread, engaging in a responsive dialogue with the ensemble.

2) Musical features

  • Innovative tradition: the guzheng is reframed through ensemble textures and choral color, blending classic timbres with contemporary sensibility for a unified modern-traditional soundworld.

3) Emotional expression

  • Emphasis on guzheng tone and lyrical contours, interwoven with choral chant, creates a soundscape that feels both profound and hopeful.
04.

Recording Venue:
Acoustic Renovation and Technological Innovation

4.1. Acoustic Renovation of the Chinese Music Rehearsal Hall

Before renovation, the hall faced low ceiling height, uneven sound distribution, and excessive echo. The project team implemented targeted strategies to resolve these issues:

  • 1) Front wall diffusion treatment: irregular hard-surface diffusers near the conductor’s area scatter early energy, minimizing strong specular reflections and improving spatial uniformity.
  • 2) Side & rear wall absorption: broadband absorbers temper primary reflections and balance timbre—especially for high-SPL sections such as percussion and suona.
  • 3) Ceiling renovation & height increase: removal of the suspended ceiling raised the height to 5–6 m; the new upper surface incorporates absorption to reduce harshness from early overhead reflections.

4.2. Application of WFS Spatial Acoustic Enhancement Technology

Wave Field Synthesis (WFS), grounded in Huygens’ principle, controls loudspeaker-array outputs to reconstruct desired sound fields—playing a pivotal role in the hall’s upgrade:

  • 1) Virtual sound field reconstruction: simulation of target concert-hall acoustics within the rehearsal room enhances recording conditions to better mirror live performance environments.
  • 2) Field uniformity & directional control: phase–amplitude steering across arrays yields even spatial coverage and optimized directivity, reducing environmental interference and improving clarity.
  • 3) Precise localization & immersion: accurate spatial rendering of source positions sharpens image stability and deepens immersion for listeners and performers.
  • 4) Instrumental sound-pressure balancing: WFS compensation elevates solo lines within the ensemble image, supporting a more balanced acoustic experience.

To meet algorithmic demands, the team developed a 3D e-Reflective Loudspeaker Panel system: a high-density array of 624 loudspeakers, including 40 wall-mounted units and 10 ceiling-mounted dual-layer modules, enabling precise, immersive WFS for both rehearsal and recording.

A suspended microphone array provides real-time 3D feedback on reflection characteristics of the target hall, allowing realistic performance simulation and on-the-fly acoustic adjustments. AI-driven multivariable models automate sound-field regulation, improving consistency between rehearsal conditions and live performance outcomes.

WFS D12 e-Reflective Loudspeaker Panel.
05.

Design and Recording Scheme for Sound Pickup

For «Sound of Shanghai: Opera of a Cosmopolis», the author developed specialized sound pickup and recording schemes tailored to the characteristics of each piece.

5.1. Microphone Design for «Welcome to the Future!»

Seating Plan — Chinese Orchestra with 100 Musicians.

1) 3D microphone design: Eight sE8 small-diaphragm cardioid condensers were strategically suspended above the orchestra to capture a comprehensive sound image. A stereo pair in DIN configuration was positioned above the conductor’s head, with two additional side microphones suspended to the left and right. A further four sE8 matrix points were evenly distributed above the string section, ensuring accurate localization and an immersive orchestral capture.

Microphone pickup-point design for overhead hanging above the orchestra.

2) Ground microphones: Three sE8 Omni PZM boundary microphones were positioned at the front of the orchestra to capture energy radiating along the floor from instruments such as the erhu and pipa. This configuration secures the lower-frequency resonance and the characteristic tonal qualities these instruments produce.

sE8 Omni PZM boundary microphones.

3) Spot Microphone Design: An sE8 XY-based recording setup was chosen for the guzheng, as this method captures a broad soundstage while remaining compatible with mono spot microphone recordings. The sE8 XY configuration enhances the retention of the guzheng’s tonal subtleties and integrates seamlessly into the overall sound field of the orchestra. It ensures a spatially immersive audio experience, while the use of the sE8, a commonly used recording microphone, guarantees a natural and authentic reproduction of the instrument’s sound.

An sE8 XY-based recording setup.
sE Electronics sE8.

Harp: An sE RN17 Neve transformer small-diaphragm condenser microphone was selected to focus on the resonant cavity beneath the harp, minimizing unwanted crosstalk. This design optimizes sound capture by selecting the ideal microphone type and placement to reduce noise interference during performances. The RN17, with its directional and frequency response characteristics, is especially suited for instrument recording. It captures the harp’s rich tonal quality from within the resonant cavity while ensuring minimal intrusion from other sounds in the environment.

sE Electronics RN17.

Double Bass Group: A single sE8 Omni PZM boundary microphone was placed independently on the ground to record an excellent low-frequency response. The PZM microphone operates based on a surface-radiation principle, offering full-frequency response and omnidirectional reception, making it ideal for accurately transmitting low-frequency sounds. This setup ensures that the rich, deep tones of the double bass are effectively collected, delivering high-quality material for the mixing engineer.

Timpani: Two sE VR2 ribbon microphones were positioned above the timpani in an AB recording configuration, designed to capture the powerful and resonant percussion tones. The AB recording method uses two or more spaced microphones to create subtle temporal differences, which are later blended in post-production. This approach documents the timbral richness of the percussion, resulting in a more spatial and authentic audio experience.

sE Electronics VR2.
A pair of sE VR2 ribbon microphones for the timpani.

Bass Drum: A single sE V7 dynamic microphone was placed near the drumhead to achieve a strong dynamic response, sound pressure resistance, and low-frequency transient capture. This setup ensures the precise recording of the bass drum’s low-frequency details, preserving its dynamic performance.

sE Electronics V7.

Pipa and Plucked Instruments: An sE 4400a large-diaphragm condenser microphone with a figure-8 polar pattern was positioned to capture sound from both the front and rear diaphragms. This setup enhances sound reinforcement for instruments like the pipa and ruan, ensuring a more balanced and natural recording of their distinctive tonal qualities.

sE Electronics 4400a.

5.2. «Where Dreams Begin!»: Recording Plan for Soprano and Ruan Ensemble

Seating Plan – Ruan Ensemble.
Ruan Ensemble and Kun-operatic soprano at the recording site.

1) 3D Microphone Design: Eight sE8 small-diaphragm cardioid condenser microphones were suspended above the ensemble. A pair of DIN-configured sE8 microphones was positioned above the conductor’s head at the front, while another pair of side-spread sE8 microphones was hung on either side of the conductor. Additionally, four sE8 microphones were evenly distributed in a matrix formation above the string section to capture a balanced and immersive sound field.

2) Spot Microphone Design: The RN17 Neve transformer small-diaphragm condenser microphone was placed in front of the liuqin section to record its tonal details. One sE8 spot microphone was set for each section of the pipa and zhongruan. These microphones were strategically positioned to preserve the subtle acoustics of each instrument section, ensuring the purity of the sound quality. The sE8 spot microphone is known for its ability to capture performance details and deliver high-fidelity audio with its precise recording capabilities.

The sE Electronics RN17 and sE Electronics sE8 are set up in front of the ruan ensemble.

3) Vocal Recording: To achieve optimal vocal pickup and separation, the sE VR1 ribbon microphone was used, with its placement and angle fine-tuned through on-site testing. The microphone was connected to a Neve 5017 single-ended discrete pure analog preamp, which contributed to a warm and sweet vocal tone. The sE VR1, known for its distinctive ribbon design, highlights nuanced characteristics, enhancing vocal clarity and separation. Paired with the Neve 5017 preamp, this setup enriches the sound, providing a warmth and depth that elevates the overall recording.

5.3. «Where Dreams Begin!» – Recording Plan for Guzheng Ensemble and Choir

1) 3D Microphone Design: Eight sE8 small-diaphragm cardioid condenser microphones were suspended above the ensemble. A pair of DIN-configured sE8 microphones was positioned above the conductor’s head at the front, while another pair of side-spread sE8 microphones was hung on either side of the conductor.
Additionally, four sE8 microphones were evenly distributed in a matrix formation above the string section
to capture a balanced and immersive sound field.

2) Choir Recording: For optimal group dynamics and to convey the atmosphere during the simultaneous guzheng performance and choir singing, a pair of sE VR2 ribbon microphones was mounted on a high stand at the front of the ensemble, approximately 2.5 meters above the ground. This configuration ensured a clear capture of each choir member’s voice, resulting in a natural and cohesive choir effect. The setup also minimized unwanted reverb and noise, preserving clarity and purity. The sE VR2 microphones, valued for their directionality and sensitivity, captured the finest nuances of each note, delivering a precise and vivid choir recording.

A pair of sE Electronics VR2 microphones is positioned in front of the guzheng ensemble.

3) Solo Recording: To emphasize Professor Qi Yao’s accompanying vocals, a dedicated sE RN17 small-diaphragm condenser microphone was independently positioned on the conductor’s podium.
This high-precision microphone is designed to deliver faithful sound reproduction, with a particular focus on enhancing the vocal performance.

5.4. Kunqu Opera Vocal Recording

A section of Kunqu opera vocals was recorded using the sE NEVE Series RNT tube microphone for a cappella sampling. The performer, Zhao Jinyu — China’s cultural ambassador for Kunqu opera and director of the Shanghai Kunqu Peng School Art Study Center — is dedicated to the preservation and promotion of Kunqu opera. She apprenticed under Kunqu artist Zhang Xunpeng, specializing in the Peng School’s boudoir role performance style. Zhao’s voice is bright, sweet, melodious, and highly expressive.

Kunqu Opera singer Zhao Jinyu at the recording site.

Originating in the late Yuan and early Ming dynasties as the Kunshan tune, Kunqu opera was refined by Wei Liangfu into the gentle and intricate Shuimo tune. Zhao Jinyu’s vocal expression is an embodiment of the Shuimo tune, characterized by meticulous articulation and extended breath control, excelling in nuanced emotional expression. In her performance, she employs Kunqu-specific techniques, such as soutone and pause-accent, which add charm and resonance to her singing.

The sE NEVE Series RNT tube microphone, with its warm and soft tonal characteristics, is well-suited for capturing the delicate details and subtle shifts in Kunqu opera vocals. This microphone faithfully reproduces Zhao Jinyu’s bright and melodious voice, showcasing the gentle, intricate beauty of the Shuimo tune. Its performance ensures a refined recording, offering listeners an artistically evocative auditory experience. Through the RNT’s recording, Zhao Jinyu’s a cappella singing not only highlights the uniqueness of Kunqu opera vocals but also exemplifies the seamless fusion of traditional art and modern technology.

sE Electronics RNT.
06.

Spatial Audio Production:
The Fusion of Technology and Art

Spatial audio production is carried out at the Full Dimension Multimedia Spatial Audio Workstation, with support from spatial audio recording engineer Iverson Xu and in collaboration with the team from the Shanghai Conservatory of Music. During production, 3D microphone recordings are used to collect spatial information. Building on this, object-based audio production is applied. Unlike traditional channel-based audio, where sounds are mixed into specific output channels, object-based audio keeps each channel independent, adding metadata about their positions—such as 3D location, size, or movement. These independent channels (or objects) are then distributed, and during playback, the system renders the sound in real time based on the metadata. Throughout the spatial audio mixing production, Sennheiser HD800 open-back monitoring headphones, paired with a head tracker and connected to the SVS (Sound Room Virtualization System) algorithm, were employed to perform spatial sound checks. This technology restores each instrumental part to its proper position in the XYZ three-dimensional space, creating an immersive live acoustic environment.

Spatial audio production for «Sound of Shanghai: Opera of a Cosmopolis»
at the recording site of the studio at the Shanghai Conservatory of Music
WFS-assisted spatial production session, with the composer monitoring through Sennheiser HD800 headphones equipped with a head tracker.

Spatial audio, as an emerging form of music production, is gradually revolutionizing the traditional stereo auditory experience. It no longer confines itself to mixing the left and right channels but instead aims to reconstruct a three-dimensional, or even higher-dimensional space in the auditory realm, drawing the listener into a realistic and captivating sound field. However, spatial audio production is not simply a stacking of technologies; at its core lies a spatial philosophy and a profound understanding of the complex relationships between the recording venue, the auditory space, and their interplay.

Traditional stereo recording treats the venue as a relatively static collection of sound sources, simulating their positions and distances through left and right channel distribution. In contrast, spatial audio production acknowledges a more intricate reality: the spatial attributes of the recording venue and the auditory space are often physically inconsistent. This inconsistency manifests in the sound field, auditory dimensions, microphone pickup locations, and speaker playback positions. While recording venues possess inherent acoustic characteristics that shape their sound fields, the auditory space is defined by the speaker system, listening environment, and audio signal. Differences in their acoustic traits directly influence the auditory experience.

Moreover, while recording venues are typically three-dimensional spaces, auditory spaces can extend into higher dimensions. Using suspended microphones, main and auxiliary microphones, and ground PZM microphones, spatial information about sound source positions in both horizontal and vertical dimensions can be captured. Although some spatial information loss is inevitable during recording and playback, certain details can be emphasized through post-production.

Addressing the physical inconsistencies between the recording venue and the auditory space, the core of spatial audio production lies in establishing an effective mapping relationship — converting the spatial information of the recording venue into auditory experiences within the sound space. This mapping is not a simple replication; it requires creative reconstruction and reinterpretation of the space. Spatial transformation can generate surreal auditory experiences, while spatial fusion creates distinctive auditory landscapes. Spatial narrative, in turn, establishes immersive auditory experiences. It integrates the philosophy of spatial folding into its creation, breaking the limitations of physical space to deliver a completely new auditory experience.

Spatial folding refers to bending or folding a space so that different parts connect or come closer together. In spatial audio production, it can merge various recording venues into a new space, overlay sounds from different temporal points to create a temporal space, and map different auditory dimensions to construct a higher-dimensional sound field. The philosophical significance of spatial folding challenges our conventional understanding of space, presenting it not as static or fixed, but as a malleable, dynamic entity capable of being bent, folded, and reconstructed. Through spatial folding, we transcend the constraints of physical space, creating an experience that pushes the boundaries of imagination and reality.

During the production process, to reflect sound effects as naturally and authentically as possible, effects processors were used sparingly, with an emphasis on presenting the spatiality and layering inherent in the music itself. At the same time, the recording engineer re-optimized and balanced the parts based on the interpretation of the full score, making specific dynamic adjustments to the low-frequency instrument sections to prevent muddiness and congestion. For solo instruments, automation technology was employed, enabling the recording engineer and music producer to, through their emotional understanding of the piece, meticulously craft the envelope curves. This process incorporated the engineer’s tertiary creation, enhancing the expressiveness of the music.

In this project, the final master output included two formats: Dolby Atmos and IMF (IOSONO). The Dolby Atmos master file delivers a spatial audio effect on a 9.1.6 channel system, while the high-precision IMF master files are adaptable to future sound installation designs in any scenario or space.

Spatial audio production for «Sound of Shanghai: Opera of a Cosmopolis»
at the Full Dimension Multimedia Studio in Shanghai —
Wave Field Synthesis (WFS) workflow in session.
07.

Conclusion

«Sound of Shanghai: Opera of a Cosmopolis» represents a comprehensive fusion of music creation, recording technology, and cultural heritage. More than just a musical endeavor, it stands as a testament to interdisciplinary collaboration, reflecting the artistic vision of Marios Joannou Elia, who played a pivotal role as both composer and project director. It also highlights the strengths of the Shanghai Conservatory of Music in music education and research, with the Full Dimension Multimedia team and sE Electronics providing essential technical expertise and state-of-the-art equipment to bring this vision to life.

Beyond its technical achievements, the audio recording process was an exploration of musical aesthetics, philosophy, and sociocultural values. Professor Han Zhong’en of the Shanghai Conservatory of Music emphasizes the integration of sound and imagery, arguing that the highest form of music evokes emotional resonance and spiritual elevation. Professor Wang Defeng of Fudan University, through a philosophical lens, examines the interplay between the «thing-in-itself» and «phenomena.» While absolute reality remains elusive, art serves as a bridge, offering deeper insights into perception and meaning through sonic expression. Throughout the recording process, advanced technology was used to preserve the authenticity and nuances of the music, allowing listeners to engage in a more profound and contemplative auditory experience.

On a cultural and social level, this project carries significant value. By creatively reinterpreting traditional Chinese music through modern compositional and technological approaches, it contributes to the preservation and evolution of China’s musical heritage. At the same time, it serves as a practical case study for the development of China’s spatial audio standards, reinforcing the nation’s competitiveness in the global audio field. In an era of cultural fusion and diversity, such innovations help ensure that traditional Chinese music is not only preserved but also dynamically reimagined for contemporary and future audiences.

Beyond its artistic and cultural impact, the project also represents a breakthrough in spatial audio technology. «Sound of Shanghai: Opera of a Cosmopolis» introduces new standards in immersive sound production, particularly in Wave Field Synthesis (WFS), object-based spatial audio, and immersive sound environments. By integrating AI-driven audio processing and adaptive spatial techniques, it sets a precedent for future applications in the metaverse, virtual reality (VR), and artificial intelligence (AI)-enhanced audio experiences. As the metaverse evolves, spatial audio is becoming a cornerstone of immersive environments, enabling realistic concert simulations, interactive museum exhibits, and AI-driven soundscapes. This project’s object-based WFS approach could serve as a model for future virtual performances and adaptive sound installations.

Ultimately, this project demonstrates the boundless potential of music, technology, and cultural exchange. By bridging tradition and innovation, past and future, East and West, it paves the way for new artistic possibilities in the rapidly expanding landscape of spatial audio and storytelling.

Composer and director Marios Joannou Elia, spatial audio recording engineer Iverson Xu (Zhende Xu), and the various types of microphones used during the recording.