The Short Message Service or SMS has prevailed as a very popular communication channel in mobile phone users since its early advent. However, in this day and age of web-based instant messaging applications, this service has indeed lost its former dependence. Instead, now SMS has turned into the forte of spammers. In this work, a easily available, popular SMS data set has been used, which is modified by adding both regional spam and ham texts that are typed in english. Thereafter the new set of data is processed, features are extracted and then classified by using three widely used classification algorithms, to provide a enriched recognition system that is more suited to identifying SMS spams in the Indian context. Experimental results show that SVM performs most robustly among the classifiers used in our work, as determined by a Monte Carlo approach.
Keywords: SMS spam; Spam filtering; Natural language processing; Supervised learning; Text classification.