Project name: Submission Clustering and Classification using Various Classification Algorithm

Project Summary

Error Notes

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 5
      2 plt.figure(figsize=(12, 10))
      3 correlation_matrix = data_numerik.corr()
----> 5 sns.heatmap(correlation_matrix, annot=False, cmap='coolwarm', vmin=-1, vmax=1)
      6 plt.title('Correlation Matrix')
      7 plt.show()

NameError: name 'sns' is not defined

terdapat kesalahan pada cell ke-15 . kesalahan tersebut menandakan bahwa tidak ada submodule yg bernama ‘sns’ . artinya kita perlu meng-import library seaborn terlebih dahulu. begini perbaikannya :


import seaborn as sns  # Pastikan untuk mengimport library yg dibutuhkan

# Visualisasi korelasi antar variabel numerik
plt.figure(figsize=(12, 10))
correlation_matrix = data_numerik.corr()

sns.heatmap(correlation_matrix, annot=False, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix')
plt.show() 


# Calculate IQR for each numerical feature
for feature in data_numerik.drop(columns=["ID"]).columns:
    Q1 = data_selection[feature].quantile(0.25)
    Q3 = data_selection[feature].quantile(0.75)
    IQR = Q3 - Q1

    # Calculate lower and upper bounds for outliers
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Apply the lambda function to cap outliers
    data_selection[feature] = data_selection[feature].apply(
        lambda x: lower_bound if x < lower_bound else upper_bound if x > upper_bound else x
    )

    # Plotting boxplot for each feature
    plt.figure(figsize=(10, 6))
    sns.boxplot(x=data_selection[feature])
    plt.title(f'Box Plot of {feature}')
    plt.show()

pada cell ke-25 , saat cell tsb di run, maka akan muncul error :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:\\Users\\aliff\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\indexes\\base.py:3791, in Index.get_loc(self, key)
   3790 try:
-> 3791     return self._engine.get_loc(casted_key)
   3792 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas\\_libs\\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\\_libs\\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Year_Birth'

The above exception was the direct cause of the following exception:

KeyError: 'Year_Birth'

itu karena variable data_selection tidak ada fitur yg bernama ‘Year_Birth’.

jika ingin memvisualisasi boxplot untuk semua fitur yg ada di data_numerik , maka perbaikan kode nya seperti ini :

# Calculate IQR for each numerical feature
for feature in data_numerik.drop(columns=["ID"]).columns:
    Q1 = data_numerik[feature].quantile(0.25)  # MENGGANTI VARIABLE data_selection dengan variable data_numerik
    Q3 = data_numerik[feature].quantile(0.75)  # mengganti variable data_selection dengan variable data_numerik
    IQR = Q3 - Q1

    # Calculate lower and upper bounds for outliers
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Apply the lambda function to cap outliers
    data_numerik[feature] = data_numerik[feature].apply(
        lambda x: lower_bound if x < lower_bound else upper_bound if x > upper_bound else x
    )

    # Plotting boxplot for each feature
    plt.figure(figsize=(10, 6))
    sns.boxplot(x=data_numerik[feature])
    plt.title(f'Box Plot of {feature}')
    plt.show()

namun jika ingin memvisualisasikan boxplot dari fitur yg ada di data_selection, maka perbaikannya seperti ini :

# Calculate IQR for each numerical feature
for feature in data_selection.drop(columns=["ID"]).select_dtypes(include=['float64', 'int64']).columns:
    Q1 = data_selection[feature].quantile(0.25)
    Q3 = data_selection[feature].quantile(0.75)
    IQR = Q3 - Q1

    # Calculate lower and upper bounds for outliers
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Apply the lambda function to cap outliers
    data_selection[feature] = data_selection[feature].apply(
        lambda x: lower_bound if x < lower_bound else upper_bound if x > upper_bound else x
    )

    # Plotting boxplot for each feature
    plt.figure(figsize=(10, 6))
    sns.boxplot(x=data_selection[feature])
    plt.title(f'Box Plot of {feature}')
    plt.show()