Opublikowane: 2024-04-021

Audio stream analysis for deep fake threat identification

Karol Jędrasiak
Civitas et Lex
Dział: Nauki o bezpieczeństwie
https://doi.org/10.31648/cetl.9684

Abstrakt

The article introduces a new method for identifying deepfake threats in audio, focusing on detecting synthetic speech generated by text-to-speech algorithms. Central to the presented method are two elements: the Vocal Emotion Analysis (VEA) Network and the Supervised Classifier for Deepfake Detection. The VEA Network detects emotional nuances in speech, while the Classifier uses these features to differentiate between real and fake audio. This approach exploits the inability of deepfake algorithms to replicate the emotional complexity of human speech, adding a semantic layer to the detection process. The system’s effectiveness has been confirmed through tests on various datasets, including in challenging real-world conditions simulated with data augmentation, such as adding white noise. Results show consistent, high accuracy across different datasets and in noisy environments, particularly when trained with noise-augmented data. This method, leveraging voice’s emotions content and advanced machine learning, offers a robust defense against audio manipulation, enhancing the integrity of digital communications amidst the rise of synthetic media. 

Słowa kluczowe:

audio modification detection, voice analysis, fake audio detection

Pobierz pliki

Zasady cytowania

Jędrasiak, K. (2024). Audio stream analysis for deep fake threat identification. Civitas Et Lex, 41(1), 21–35. https://doi.org/10.31648/cetl.9684

Cited by / Share

Ta strona używa pliki cookie dla prawidłowego działania, aby korzystać w pełni z portalu należy zaakceptować pliki cookie.