Voice + Research Lab

Goals
This project proposes an ongoing interdisciplinary inquiry into the human voice to address a primary research question: How do young people perceive and interpret emotional timbre in human voices compared to AI-generated voices? The collaboration between linguistics, voice studies models arts-integrated learning that pushes students to bridge analytical and creative modes of thought. Students will collect and collate recordings of voices, aiming to examine how listeners (aged 10-25) distinguish emotional timbre in human voice across age, language, and identity. The result will be an interactive sound installation entitled Voice Boxes, with the goal of establishing a permanent Voice Museum at Georgia Tech.
Issues Involved or Addressed
The growing capability of vocal-identity-cloning Artificial Intelligence (AI) to reproduce human speakers’ identities—capturing their acoustic-articulatory patterns, cadence, pitch, and volume—has achieved remarkable accuracy. These systems can now replicate the prosody of original speech from reference audio with striking fidelity. As researchers, we seek to understand how this technological advancement influences younger listeners’ abilities to recognize, interpret, and distinguish authentic human vocal timbre and emotional expression. While AI-generated voices may sound convincingly human in their prosodic mimicry—from intonation, cadence, stress, tempo and phrasing—our study aims to explore how listeners differentially perceive emotional qualities in AI versus human voices, with a particular focus on the elusive and complex dimension of timbre—perhaps the most challenging aspect of vocal expression to define and measure. In an age where AI-generated voices saturate digital environments—from Siri and Alexa to TikTok narrations and automated captions—young listeners are increasingly exposed to synthetic rather than organic human speech. This study explores whether such exposure affects the ability to interpret emotional expression in authentic human voices. We focus particularly on vocal timbre, the subtle quality that distinguishes one voice or color from another and conveys affective depth beyond words and prosody. By comparing listeners’ emotional recognition accuracy across human and AI-generated speech samples, we aim to assess how digital listening habits may be reshaping auditory empathy and the perception of authenticity in the human voice. We propose to gather about 40 hours of vocal recordings —from human and AI—across adolescent, elderly, and multilingual registers, and present them in an intimate artistic and acoustic setting to invite interpretive listening. In each box, visitors encounter both human and AI voices, and a mix of ages and languages. The intimate setting of the box is intended to limit other senses and heighten intentional listening of timbre, or the tone colors or textures of a sound.
Methods and Technologies
- Interdisciplinary Voice Studies
Majors Sought
Business: Marketing, Strategy and Innovation
Design: Music Technology
Engineering: Computer Engineering, Electrical Engineering
Liberal Arts: Applied Languages and Intercultural Studies, Chinese, Computational Media, Digital Media, Economics, Economics and International Affairs, Film and Media Studies, French, German, Global Economics and Modern Languages, Global Media and Cultures, History, Technology, and Society, International Affairs, International Affairs and Modern Languages, International Affairs, Science, and Technology, Japanese, Korean, Literature, Media, and Communication, Public Policy, Spanish
Preferred Interests and Preparation
General or Major-Specific
Advisors
Andrea Jonsson
Andrea Jonsson
ajonsson7@gatech.edu
Hongchen WuModern Languageshwu480@gatech.edu
Day, Time & Location
Full Team Meeting:
9:30-10:20 Thursday
Van Leer 465
Subteam meetings scheduled after classes begin.