Sampling from a specific part of a normal distribution in R

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Sampling from a specific part of a normal distribution in R



I'm trying to first extract all values <= -4 (call these p1) from a mother normal distribution. Then, randomly sample 50 of p1s with replacement according to their probability of being selected in the mother (call these 50s p2). For example, -4 is more likely to be selected than -6 which is further into the tail area.


<= -4


p1


mother


p1


mother


p2


-4


-6



I was wondering if my R code below correctly captures what I described above?


mother <- rnorm(1e6)
p1 <- mother[mother <= -4]
p2 <- sample(p1, 50, replace = T) # How can I define probability of being selected here?



enter image description here




3 Answers
3



You can use function sample argument prob. Quoting from help("sample"):


sample


prob


help("sample")



prob a vector of probability weights for obtaining the elements of
the vector being sampled.



And in the section Details:


Details



The optional prob argument can be used to give a vector of weights for
obtaining the elements of the vector being sampled. They need not sum
to one, but they should be non-negative and not all zero.



So you must be careful, the more distant from the mean value the smaller the probabilities, the normal distribution drops to small values of probability very quickly.


set.seed(1315) # Make the results reproducible

mother <- rnorm(1e6)
p1 <- mother[mother <= -4]

p2 <- sample(p1, 50, replace = T, prob = pnorm(p1))



You can see that it worked with the histogram.


hist(p2)





@rnorouzian Mine assumes a normal distribution, TUSHAr uses the uniform distribution.
– Rui Barradas
Aug 13 at 4:16





@rnorouzian I'm getting an error, Error in sample.int(length(x), size, replace, prob) : incorrect number of probabilities.
– Rui Barradas
Aug 13 at 4:23


Error in sample.int(length(x), size, replace, prob) : incorrect number of probabilities





@rnorouzian Yes, and with the example dataset it is, mean(p1) returns [1] -4.208714, mean(p2) returns [1] -4.125671.
– Rui Barradas
Aug 13 at 4:32



mean(p1)


[1] -4.208714


mean(p2)


[1] -4.125671





@rnorouzian Yes, it would. The bigger the sd the bigger the effect.
– Rui Barradas
Aug 13 at 4:39


sd





I don't think this method correctly samples from the normal distribution (conditional on x <= -4). p1 should already reflect the relative probabilities of say, -5 < x <= -4 compared to -6 < x <= -5. By sampling again from p1, using pdf(x) as sampling weights, you double down on that difference, exaggerating it.
– Marius
Aug 13 at 5:09


p1


p1



Wouldn't it be easier to sample from a truncated normal distribution in the first place?


truncnorm::rtruncnorm(50, a = -Inf, b = -4)



I think you are looking for something like this:


mother <- rnorm(1e6)
p1 <- mother[mother <= -4]



Calculate probability of p1 getting selected from mother


mother


p2 <- sample(p1, 50, replace = T,prob = pnorm(p1,mean = mean(mother),sd = sd(mother)))






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard