tsahay

GuiTones-I: An Audio-Visual Database of Monophonic Guitar Tones

IEEE Citation: A. Aggarwal, R. Kumar, T. Sahay and M. Chandra, "GuiTones-I: An audio-visual database of monophonic guitar tones," 2016 IEEE Region 10 Conference (TENCON), Singapore, 2016, pp. 497-500.

URL: http://ieeexplore.ieee.org/document/7848049/

Abstract: Automatic music transcription (AMT) is considered one of the most complex problems in music information retrieval (MIR). Many attempts of transcribing polyphonic music have been made in the past but monophonic tones, that build melodious music pieces, still remain largely untouched. The foremost approach is preparation of a database consisting of isolated and continuous monophonic sounds in order to proceed with the recognition and transcription of monophonic music pieces. In this paper one such audio-visual database of monophonic guitar tones for multi-modal signal processing has been introduced and evaluated for a test case of both online and offline recognition and transcription. The database comprises of recorded audio samples of first nine frets for all six strings and images of different ways in which the fretboard was held while playing these frets. A total of over 10,000 audio-visual samples have been recorded by 40 amateur and professional guitarists. The prepared database will be made available for free download under a CC BY-NC-SA 4.0 license and in DVDs at a nominal cost covering shipping and handling charges.

Grid search analysis of nu-SVC for text-dependent speaker-identification

IEEE Citation: A. Aggarwal, T. Sahay, A. Bansal and M. Chandra, "Grid search analysis of nu-SVC for text-dependent speaker-identification," 2015 Annual IEEE India Conference (INDICON), New Delhi, 2015, pp. 1-5.

URL: http://ieeexplore.ieee.org/document/7443790/

Abstract: Recent research has strongly established the application of Support Vector Machines for Speaker Recognition. In this paper, we present the variations in efficiency of a model for various parameters of nu-SVC for text-dependent speaker-identification. Radial Basis Function (RBF), sigmoid and polynomial kernels have been used for classification. A statistical comparison between all the three kernels has been shown, highlighting the dependence of each on SVM parameters such as gamma, degree of polynomial and nu. For feature extraction, LPC, MFCC and a combination of both has been employed. The performance of RBF kernel was found to be better than Polynomial as well as Sigmoid Kernel for all feature extraction techniques, with best efficiency for MFCC.

Presented with the "Best Paper Award"

SVM and ANN: A comparative evaluation

IEEE Citation: T. Sahay, A. Aggarwal, A. Bansal and M. Chandra, "SVM and ANN: A comparative evaluation," 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Dehradun, 2015, pp. 960-964.

URL: http://ieeexplore.ieee.org/document/7375263/

Abstract: Support vector machines (SVMs) are among the most robust classifiers for the purpose of speech recognition. This paper compares one of the more contemporary methods of classification, artificial neural network (ANN) with support vector machines and draws conclusions based on a comparison of accuracy. The neural network is a pattern network for variable hidden neurons and transfer functions. C- Support vector classifier is used with three different kernels and kernel parameters. MFCC has been used as the feature extraction technique for a noiseless database of 50 independent speakers. The results were found to be best for SVM with RBF kernel in comparison to bi-quadratic polynomial and sigmoid kernels and pattern network.

Performance evaluation of artificial neural networks for isolated Hindi digit recognition with LPC and MFCC

IEEE Citation: A. Aggarwal, T. Sahay and M. Chandra, "Performance evaluation of artificial neural networks for isolated Hindi digit recognition with LPC and MFCC," 2015 International Conference on Advanced Computing and Communication Systems, Coimbatore, 2015, pp. 1-6.

URL: http://ieeexplore.ieee.org/document/7324099/

Abstract: Artificial neural networks (ANN) are one of the most robust classifiers having a long standing history of application for voice recognition. In this paper, a comparative study between two different types of neural networks for isolated Hindi digit recognition has been presented. The two networks, pattern net and feed-forward net have been used for digits classification with multiple combinations of transfer functions and hidden neurons. LPC, MFCC and combinations of both have been used as feature extraction techniques for experiments. The results have been found in favor of pattern net for all the tested cases. A noiseless database of 50 independent speakers has been used for simulation.

Presented with the "Best Paper Award"