AI BASED DEEPFAKE AUDIO DETECTION – A REVIEW
Abstract
The rapid rise of deepfake audio presents a dual reality: enabling innovative applications like voice assistants and accessibility tools, while also posing severe risks to security and trust through fraud and misinformation. Modern systems can clone a voice from just a few seconds of audio, making it hard to distinguish real from synthetic speech. This study investigates machine learning methods for detecting deepfake audio, using features such as MFCCs and spectrograms with classifiers including Random Forests and CNNs on datasets like FoR and ASVspoof. Results show that combining optimized features with advanced models significantly boosts detection accuracy. We also address ongoing challenges like limited data diversity, adversarial attacks, and real-world scalability, alongside ethical concerns. Our goal is to contribute to the development of reliable and practical detection systems.
Keywords : Deepfake audio, voice cloning, audio forensics, machine learning, MFCC, spectrogram, deepfake detection, Random Forest, convolutional neural networks (CNN).