No 37 (2021)

Building a Machine Learning Model on Breast Cancer Data with Focus on Cross Validation and Accuracy

Authors: Sagar Rai, Aditya Anand, Kunal Singh

Abstract: Breast cancer, abbreviated as BC, is one of the most prominent cancers among females globally, consisting of the major percentage of the new cancerous cases and the disease-related fatalities in the world among the gender. This makes the disease a major health-related issue in the current world. Disease’s early diagnosis highly upgrades the prognosis and result in a high survival rate among women. This is mainly due to the fact that the early diagnosis may promote timely clinical treatment. Additionally, the correct classification of benign (not risky) tumors saves the patients from going to unnecessary treatments. The unique advantages of Machine Learning (ML) to detect complex relations and critical features have a major advantage over any other traditional method for correct classification of the disease tumor. Research shows that an expert physician can diagnose a case of breast cancer with an accuracy of 79 percent while the accuracy of 91% or above is achieved by using machine learning algorithms. In the conducted project, we have performed various operations (data pre-processing and feature selection) on the raw data collected from the UCI repository to get meaningful data from the raw data. We then trained various Machine Learning models on the meaningful data to achieve great accuracy in the classification of the breast tumor as dangerous or not. The study’s main aim was to find an algorithm that has a good cross-validation score along with a high cross-validation score. K-fold cross-validation was used for testing the trained model. This ensured that the model was neither highly biased neither had a high variance. Application programming interface (API) support for the model using Flask is also provided for cross-language usage of the trained model.

Keywords: Breast Cancer, Machine Learning, Malignant, Benign, Tumor, Cross Validation, K-Folds, Flask, API

 

Full Issue

View or download the full issue PDF 242-248

Table of Contents