How to generate and interpret a ROC curve for binary classification?

How to generate and interpret a roc curve for binary classification

Let’s start this post with a question: “How to generate and interpret a roc curve for binary classification?”. This post will try to find out the answer to this question. Binary classification is the task of classifying the elements of a set into two groups. ROC curve is used to diagnose the performance of a classification … Read more

What is hypothesis testing in data science?

Hypothesis testing is a statistical technique used to evaluate hypotheses about a population based on sample data. In data science, hypothesis testing is an essential tool used to make inferences about the population based on a representative sample. In this blog, we will discuss the key aspects of hypothesis testing, including null hypothesis, alternate hypothesis, … Read more

Categorical features with high cardinality: Dealing with Feature Hashing

dark dirty desk notebook

Firstly, “Dealing with categorical features with high cardinality: Feature Hashing”, is an interesting question. So, this post will be interesting and will help a lot of learners. Introduction to Feature Hashing Generally, many machine learning algorithms are not able to use non-numeric data. So, we represent these features using strings. And we need some way … Read more

Deal with class imbalance (without generating synthetic samples): Clustering Based Bagging Algorithm (CBBA)

to deal with class imbalance

To deal with class imbalance, take a look at “In classification, how do you handle an unbalanced training set?”. Definitely, the answers were very creative. The rookie way to deal with class imbalance The rookie’s way: The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets. But … Read more