Kolmogorov-Smirnov test in python

Imagine you are given some data and asked to find the (parametric) probability distribution that best describes the data. There are many ways you could go about it, but the most mainstream approach would be using Kolmogorov-Smirnov (KS for short) to find matching one. Below we will go over a piece of python code which does that. But before that, let's try to understand the logic.

If we go over the process step-by-step, you should do the following:

  1. List the "suspects", i.e. possible distributions which could possibly describe the data. Keep in mind that we are trying to fit parametric distributions, so building a list of names is crucial. Usually, normal, t, chi-square, F, lognormal, gamma, beta and other names would pop-up if you are dealing with continuous variables;
  2. Fit the distributions to the data to get the best fitting parameters;
  3. Use KS test with given parameters for every distribution to find the best fitting one.

Now in a code:

https://gist.github.com/kamil-a/6edacd2eacb5d856a27f3915907955f8

As you can see, Chi-square has the highest p-value. But Gamma also has high p-value. This is because Chi-square is a special case of Gamma distribution!

0 Comments
Leave a Comment

Kolmogorov-Smirnov test in python

Kamil Alasgarov

Data science üçün python yüklənməsi

Kamil Alasgarov

Data scientist olmaq üçün haradan başlayaq 2: detallı mənbələr

Kamil Alasgarov

Data Science nədir?

Kamil Alasgarov