Voice + Research Lab

Goals

This project proposes an ongoing interdisciplinary inquiry into the human voice to address a primary research question: How do young people perceive and interpret emotional timbre in human voices compared to AI-generated voices? The collaboration between linguistics, voice studies models arts-integrated learning that pushes students to bridge analytical and creative modes of thought. Students will collect and collate recordings of voices, aiming to examine how listeners (aged 10-25) distinguish emotional timbre in human voice across age, language, and identity. The result will be an interactive sound installation entitled Voice Boxes, with the goal of establishing a permanent Voice Museum at Georgia Tech.

Issues Involved or Addressed

The growing capability of vocal-identity-cloning Artificial Intelligence (AI) to reproduce human speakers’ identities—capturing their acoustic-articulatory patterns, cadence, pitch, and volume—has achieved remarkable accuracy. These systems can now replicate the prosody of original speech from reference audio with striking fidelity. As researchers, we seek to understand how this technological advancement influences younger listeners’ abilities to recognize, interpret, and distinguish authentic human vocal timbre and emotional expression. While AI-generated voices may sound convincingly human in their prosodic mimicry—from intonation, cadence, stress, tempo and phrasing—our study aims to explore how listeners differentially perceive emotional qualities in AI versus human voices, with a particular focus on the elusive and complex dimension of timbre—perhaps the most challenging aspect of vocal expression to define and measure. In an age where AI-generated voices saturate digital environments—from Siri and Alexa to TikTok narrations and automated captions—young listeners are increasingly exposed to synthetic rather than organic human speech. This study explores whether such exposure affects the ability to interpret emotional expression in authentic human voices. We focus particularly on vocal timbre, the subtle quality that distinguishes one voice or color from another and conveys affective depth beyond words and prosody. By comparing listeners’ emotional recognition accuracy across human and AI-generated speech samples, we aim to assess how digital listening habits may be reshaping auditory empathy and the perception of authenticity in the human voice. We propose to gather about 40 hours of vocal recordings —from human and AI—across adolescent, elderly, and multilingual registers, and present them in an intimate artistic and acoustic setting to invite interpretive listening.  In each box, visitors encounter both human and AI voices, and a mix of ages and languages. The intimate setting of the box is intended to limit other senses and heighten intentional listening of timbre, or the tone colors or textures of a sound.

Methods and Technologies

  • Interdisciplinary Voice Studies

Majors Sought

Business: Marketing, Strategy and Innovation

Design: Music Technology

Engineering: Computer Engineering, Electrical Engineering

Liberal Arts: Applied Languages and Intercultural Studies, Chinese, Computational Media, Digital Media, Economics, Economics and International Affairs, Film and Media Studies, French, German, Global Economics and Modern Languages, Global Media and Cultures, History, Technology, and Society, International Affairs, International Affairs and Modern Languages, International Affairs, Science, and Technology, Japanese, Korean, Literature, Media, and Communication, Public Policy, Spanish

Preferred Interests and Preparation

General or Major-Specific

Advisors

Andrea Jonsson
Andrea Jonsson
ajonsson7@gatech.edu

Hongchen Wu
Modern Languages
hwu480@gatech.edu

Day, Time & Location

Full Team Meeting:
9:30-10:20 Thursday
Van Leer 465

Subteam meetings scheduled after classes begin.