Data Science

Data science is a category that covers a wide range of topics related to the analysis and interpretation of data. This category may include keywords such as data mining, machine learning, artificial intelligence, statistical analysis, data visualization, big data, predictive modeling, and data engineering.

In this category, you can find articles, resources, and insights related to data science, including best practices, tools, and techniques used by data scientists. Whether you are a beginner or an experienced practitioner, this category can help you stay up-to-date with the latest trends and techniques in data science. You can also find tutorials, case studies, and examples of real-world applications of data science in various fields, such as healthcare, finance, marketing, and more.

How to generate and interpret a ROC curve for binary classification?

April 29, 2023September 22, 2021 by Kumar Vishwesh

Let’s start this post with a question: “How to generate and interpret a roc curve for binary classification?”. This post will try to find out the answer to this question. Binary classification is the task of classifying the elements of a set into two groups. ROC curve is used to diagnose the performance of a classification … Read more

What is hypothesis testing in data science?

April 26, 2023August 14, 2021 by Kumar Vishwesh

Hypothesis testing is a statistical technique used to evaluate hypotheses about a population based on sample data. In data science, hypothesis testing is an essential tool used to make inferences about the population based on a representative sample. In this blog, we will discuss the key aspects of hypothesis testing, including null hypothesis, alternate hypothesis, … Read more

What is a risk score and what is a credit score?

April 23, 2023July 14, 2021 by Kumar Vishwesh

What is a Risk Score and what is a Credit Score? It is important to know these concepts before you start building an ML model for credit risk scoring. What is a Risk Score? A risk score is a mathematical score. It is based on individual risk factors. The risk score assesses the risks that … Read more

Categorical features with high cardinality: Dealing with Feature Hashing

April 30, 2023May 30, 2021 by Kumar Vishwesh

Firstly, “Dealing with categorical features with high cardinality: Feature Hashing”, is an interesting question. So, this post will be interesting and will help a lot of learners. Introduction to Feature Hashing Generally, many machine learning algorithms are not able to use non-numeric data. So, we represent these features using strings. And we need some way … Read more

Deal with class imbalance (without generating synthetic samples): Clustering Based Bagging Algorithm (CBBA)

April 30, 2023April 30, 2021 by Kumar Vishwesh

To deal with class imbalance, take a look at “In classification, how do you handle an unbalanced training set?”. Definitely, the answers were very creative. The rookie way to deal with class imbalance The rookie’s way: The approach of under-sampling the majority class is an effective method in dealing with classifying imbalanced data sets. But … Read more