Multiclass voice commands classification with multiple binary convolution neural networks
Jarosław Szkoła
University of RzeszowAbstract
In machine learning, in order to obtain good models, it is necessary to train the network on a large data set. It is very often a long process, and any changes to the input dataset require re-training the entire network. If it is necessary to extend the model with new output classes, the use of the existing model becomes problematic, and in the case of extension with new decision classes, it is required to re-train the entire model based on all data. To improve this process, a new neural network architecture was proposed, which allows for easy extension of the already existing models with new classes, without the need to re-train the entire network, as well as the time needed to train the sub-model is much shorter than the time needed to re-train the entire neural network. The presented network architecture is designed for data that has at least two decision classes.
Keywords:
multiclass convolution neural networks, voting decision mechanism, voice commands classification, multiclass classifier, sound wave processing and classificationReferences
CORNELIO C., DONINI M., LOREGGIA A., PINI M.S., ROSSI F. 2021. Voting with random classifiers (VORACE): theoretical and experimental analysis. Autonomous Agents and Multi-Agent Systems, 35(22). https://doi.org/10.1007/s10458-021-09504-y. Google Scholar
DONINI M., LOREGGIA A., PINI M.S., ROSSI F. 2018. Voting with Random Neural Networks: a Democratic Ensemble Classifier. RiCeRcA 2018. arXiv:1909.08996. https://doi.org/10.48550/arXiv.1909.08996. Google Scholar
HOFFMANN J., BORGEAUD S., MENSCH A., BUCHATSKAYA E., CAI T., RUTHERFORD E., DE LAS CASAS D., HENDRICKS L.A., WELBL J., CLARK A., HENNIGAN T., NOLAND E., MILLICAN K., VAN DEN DRIESSCHE G., DAMOC B., GUY A., OSINDERO S., SIMONYAN K., ELSEN E., RAE J.W., VINYALS O., SIFRE L. 2022.Training Compute-Optimal Large Language Models. https://arxiv.org/abs/2203.15556. https://doi.org/10.48550/arXiv.2203.15556. Google Scholar
O’SHEA K., NASH R. 2015. An Introduction to Convolutional Neural Networks. arXiv:1511.08458. https://doi.org/10.48550/arXiv.1511.08458. Google Scholar
SHAFAHI A., SAADATPANAH P., ZHU CH., GHIASI A. , STUDER C., JACOBS D., GOLDSTEIN T. 2020. Adversarially Robust Transfer Learning. ICLR 2020 Conference Blind Submission. https://openreview.net/pdf?id=ryebG04YvB. Google Scholar
WARDEN P. 2017. Speech Commands: A public dataset for single-word speech recognition. http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz. Google Scholar
WARDEN P. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv:1804.03209. https://doi.org/10.48550/arXiv.1804.03209. Google Scholar
ZEGHIDOUR N., XU Q., LIPTCHINSKY V., USUNIER N., SYNNAEVE G., COLLOBERT R. 2019. Fully Convolutional Speech Recognition. arXiv:1812.06864. https://doi.org/10.48550/arXiv.1812.06864. Google Scholar
University of Rzeszow