Python: Generate random values from empirical distribution
Clash Royale CLAN TAG#URR8PPP
Python: Generate random values from empirical distribution
In Java, I usually rely on the org.apache.commons.math3.random.EmpiricalDistribution class to do the following:
Is there any Python library that provides the same functionality? It seems like scipy.stats.gaussian_kde.resample does something similar, but I'm not sure if it implements the same procedure as the Java type I'm familiar with.
@Kevin: the linked answer doesn't work for this case, because it assumes you already know the analytical form of your distribution, whereas this question is looking for something non-parametric.
– abeboparebop
Aug 7 at 13:46
1 Answer
1
import numpy as np
import scipy
import matplotlib.pyplot as plt
# This represents the original "empirical" sample -- I fake it by
# sampling from a normal distribution
orig_sample_data = np.random.normal(size=10000)
# Generate a KDE from the empirical sample
sample_pdf = scipy.stats.gaussian_kde(orig_sample_data)
# Sample new datapoints from the KDE
new_sample_data = sample_pdf.resample(10000).T[:,0]
# Histogram of initial empirical sample
cnts, bins, p = plt.hist(orig_sample_data, label='original sample', bins=100,
histtype='step', linewidth=1.5, normed=True)
# Histogram of datapoints sampled from KDE
plt.hist(new_sample_data, label='sample from KDE', bins=bins,
histtype='step', linewidth=1.5, normed=True)
# Visualize the kde itself
y_kde = sample_pdf(bins)
plt.plot(bins, y_kde, label='KDE')
plt.legend();
new_sample_data
should be drawn from roughly the same distribution as the original data (to the degree that the KDE is a good approximation to the original distribution).
new_sample_data
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I think the accepted answer here has what you're looking for.
– Kevin
Feb 16 '16 at 14:42