Into AI

Into AI

What Is Class Imbalance In Machine Learning & How To Fix It

Avoid the frustration when dealing with Imbalanced real-world datasets and learn to fix them at ease

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Sep 10, 2023
∙ Paid
Credits: Midjourney

What Is Class Imbalance?

Real-world datasets are messy (unlike the Scikit-Learn datasets).

Class imbalance arises when the distribution of examples across different classes is not uniform. 

Byte Surgery is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In other words, some classes have a lot more samples than others. 

For example, think about a dataset where the task is to detect a rare lung disease on Chest X-rays (Binary Classification). Out of 10,000 patients, only 50 might have the disease, while 9,950 do not.

The same might apply to a dataset used for a Regression problem of predicting house prices in a city. Most houses are priced between $100,000 and $500,000, but there are a few luxury mansions priced at over $10 million.

Such datasets might skew the model training towards better detecting the majority class along with the inability to detect the minority class.

User's avatar

Continue reading this post for free, courtesy of Dr. Ashish Bamania.

Or purchase a paid subscription.
© 2026 Dr. Ashish Bamania · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture