Categorical features with high cardinality: Dealing with Feature Hashing

dark dirty desk notebook

Firstly, “Dealing with categorical features with high cardinality: Feature Hashing”, is an interesting question. So, this post will be interesting and will help a lot of learners. Introduction to Feature Hashing Generally, many machine learning algorithms are not able to use non-numeric data. So, we represent these features using strings. And we need some way … Read more

Deal with class imbalance (without generating synthetic samples): Clustering Based Bagging Algorithm (CBBA)

to deal with class imbalance

To deal with class imbalance, take a look at “In classification, how do you handle an unbalanced training set?”. Definitely, the answers were very creative. The rookie way to deal with class imbalance The rookie’s way: The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets. But … Read more

Understanding Heteroskedasticity and Transformations in Linear Regression Analysis

marketing businessman person hands

Linear regression is a widely used statistical method for predicting outcomes based on input variables. However, analyzing the results of a linear regression model can be complicated, particularly when there is heteroskedasticity or a violation of the assumption of homoscedasticity. This can lead to incorrect or unreliable predictions and can be challenging to diagnose and … Read more

Curse of Dimensionality: An intuitive and practical explanation with examples

anonymous young guy testing new vr goggles

“Curse of Dimensionality: An Intuitive and practical explanation with Examples”, this article will definitely consolidate your concept. “As the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially.” Charles Isbell, Professor and Senior Associate Dean, School of Interactive Computing, Georgia Tech Curse of dimensionality The common theme … Read more