A Novel Coronavirus Speech Dataset: CoSDa
Published in In the proceedings of 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021
Download paper here Use Google Scholar for full citation
In this paper, a novel coronavirus (COVID-19) speech dataset is introduced, called CoSDa, to train machine learning models. According to our knowledge, CoSDa is the first publicly available Turkish speech dataset that includes the speech recordings of COVID-19 patients. The dataset is collected from YouTube videos of healthy people and COVID-19 patients. The CoSDa dataset provides a total of 99 healthy (COVID-19 negative) and a total of 92 unhealthy (COVID-19 positive) Turkish speech recordings. The EfficientNet and ResNet (Artık Ağ- Residual Network) architectures have been used as two-class classifiers to separate the COVID-19 speech from healthy people. The ResNet-152 classifier can achieve the best values with an accuracy of 99.64%, an F1-Score of 99.51, a precision of 0.9950, and a recall of 0.9959.
Recommended citation: Yılmaz, G., Tektaş, R., Kazan, M.S., Ndigande, A.P. and Pehlivanoğlu, M.K., 2021, October. A Novel Coronavirus Speech Dataset: CoSDa. In 2021 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-4). IEEE.