Resume Classification System using Natural Language Processing and Machine Learning Techniques

  • Irfan Ali Center of Excellence for Robotics, Artificial Intelligence, and Blockchain, Department of Computer Science, Sukkur IBA University, Sukkur-65200, Sindh Pakistan.
  • Nimra Mughal Center of Excellence for Robotics, Artificial Intelligence, and Blockchain, Department of Computer Science, Sukkur IBA University, Sukkur-65200, Sindh Pakistan.
  • Zahid Hussain Khan Center of Excellence for Robotics, Artificial Intelligence, and Blockchain, Department of Computer Science, Sukkur IBA University, Sukkur-65200, Sindh Pakistan.
  • Javed Ahmed Center of Excellence for Robotics, Artificial Intelligence, and Blockchain, Department of Computer Science, Sukkur IBA University, Sukkur-65200, Sindh Pakistan.
  • Ghulam Mujtaba Center of Excellence for Robotics, Artificial Intelligence, and Blockchain, Department of Computer Science, Sukkur IBA University, Sukkur-65200, Sindh Pakistan.

Abstract

The selection of a suitable job applicant from the pool of thousands applications is often daunting job for an employer. The categorization of job applications submitted in form of Resumes against available vacancy(s) takes significant time and efforts of an employer. Thus, Resume Classification System (RCS) using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process. Moreover, the automation of this process can significantly expedite and transparent the applicants’ screening process with mere human involvement. This experimental study presents an automated NLP and ML-based RCS that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of RCS and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP and ML techniques for RCS, the extracted features were evaluated on nine ML classification models namely Support Vector Machine - SVM (Linear, SGD, SVC and NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN), and Logistic Regression (LR). The Term-Frequency-Inverse-Document-Frequency (TF-IDF) feature representation scheme was proved suitable for RCS. The developed models were evaluated using the Confusion Matrix, F-Score, Recall, Precision, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume classification task, the SVM class of Machine Learning classifiers performed better on the study dataset of over nine hundred sixty plus parsed resumes with more than 96% accuracy. The promising results suggest that NLP and ML techniques employed in this study could be used for developing an efficient RCS.

Published
Jan 1, 2022
How to Cite
ALI, Irfan et al. Resume Classification System using Natural Language Processing and Machine Learning Techniques. Mehran University Research Journal of Engineering and Technology, [S.l.], v. 41, n. 1, p. 65 - 79, jan. 2022. ISSN 2413-7219. Available at: <https://publications.muet.edu.pk/index.php/muetrj/article/view/2353>. Date accessed: 18 jan. 2022.
This is an open Access Article published by Mehran University of Engineering and Technolgy, Jamshoro under CCBY 4.0 International License