Kolmogorov-Smirnov test in python
Imagine you are given some data and asked to find the (parametric) probability distribution that best describes the data. There are many ways you could go about it, but the most mainstream approach would be using Kolmogorov-Smirnov (KS for short) to find matching one. Below we will go over a piece of python code which does that. But before that, let's try to understand the logic.
If we go over the process step-by-step, you should do the following:
- List the "suspects", i.e. possible distributions which could possibly describe the data. Keep in mind that we are trying to fit parametric distributions, so building a list of names is crucial. Usually, normal, t, chi-square, F, lognormal, gamma, beta and other names would pop-up if you are dealing with continuous variables;
- Fit the distributions to the data to get the best fitting parameters;
- Use KS test with given parameters for every distribution to find the best fitting one.
Now in a code:
As you can see, Chi-square has the highest p-value. But Gamma also has high p-value. This is because Chi-square is a special case of Gamma distribution!
0 Comments