Course Content
Machine Learning Projects (Healthcare)
About Lesson

Dataset:
It is given by Kaggle from UCI Machine Learning Repository, in one of its challenge. It is a dataset of Breast Cancer patients with Malignant and Benign tumor. 
Logistic Regression is used to predict whether the given patient is having Malignant or Benign tumor based on the attributes in the given dataset. 
Code : Loading Libraries

 
 
 
 
 

Code : Loading dataset

 
 
 
 
 

Output :

Code : Loading dataset

 

Output :

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 33 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 569 non-null int64
1 diagnosis 569 non-null object
2 radius_mean 569 non-null float64
3 texture_mean 569 non-null float64
4 perimeter_mean 569 non-null float64
5 area_mean 569 non-null float64
6 smoothness_mean 569 non-null float64
7 compactness_mean 569 non-null float64
8 concavity_mean 569 non-null float64
9 concave points_mean 569 non-null float64
10 symmetry_mean 569 non-null float64
11 fractal_dimension_mean 569 non-null float64
12 radius_se 569 non-null float64
13 texture_se 569 non-null float64
14 perimeter_se 569 non-null float64
15 area_se 569 non-null float64
16 smoothness_se 569 non-null float64
17 compactness_se 569 non-null float64
18 concavity_se 569 non-null float64
19 concave points_se 569 non-null float64
20 symmetry_se 569 non-null float64
21 fractal_dimension_se 569 non-null float64
22 radius_worst 569 non-null float64
23 texture_worst 569 non-null float64
24 perimeter_worst 569 non-null float64
25 area_worst 569 non-null float64
26 smoothness_worst 569 non-null float64
27 compactness_worst 569 non-null float64
28 concavity_worst 569 non-null float64
29 concave points_worst 569 non-null float64
30 symmetry_worst 569 non-null float64
31 fractal_dimension_worst 569 non-null float64
32 Unnamed: 32 0 non-null float64
dtypes: float64(31), int64(1), object(1)
memory usage: 146.8+ KB

Code: We are dropping columns – ‘id’ and ‘Unnamed: 32’ as they have no role in prediction

Code : Input and Output data

Code : Normalisation

Code : Splitting data for training and testing.

 
 
 
 
 

Output :

x train:  (32, 483)
x test: (32, 86)
y train: (483,)
y test: (86,)

Code : Weight and bias

 

 
 
 
 
 
 
 

Code : Sigmoid Function – calculating z value.

 

 
 
 
 
 
 
 

Code : Forward-Backward Propagation

 

 
 
 
 
 
 
 

Code : Updating Parameters

 

 
 
 
 
 
 
 

Code : Predictions

 

 
 
 
 
 
 
 

Code : Logistic Regression

 

 
 
 
 
 
 
 

Output :

Cost after iteration 0: 0.692836
Cost after iteration 10: 0.498576
Cost after iteration 20: 0.404996
Cost after iteration 30: 0.350059
Cost after iteration 40: 0.313747
Cost after iteration 50: 0.287767
Cost after iteration 60: 0.268114
Cost after iteration 70: 0.252627
Cost after iteration 80: 0.240036
Cost after iteration 90: 0.229543
Cost after iteration 100: 0.220624
Cost after iteration 110: 0.212920
Cost after iteration 120: 0.206175
Cost after iteration 130: 0.200201
Cost after iteration 140: 0.194860

Output :

train accuracy: 37.267080745341616 %
test accuracy: 37.2093023255814 %

Code : Checking results with linear_model.LogisticRegression

 

 
 
 
 
 
 
 

Output :

test accuracy: 0.627906976744186 
train accuracy: 0.6273291925465838

Get the complete notebook and dataset link here:

Notebook link : click here.

Dataset link : click here