Determine the minimal length of a genomic region that has a probability larger than P of having at least a single insertion.
Created by: Gregory94
I am developing a bar plot for showing the amount of reads for each transposon insertion in a gene. I have a question about determining the width of the bars. Currently I made the bars so wide that they include a fixed amount of insertions (e.g. each bar includes 8 transposon insertions). But for large regions that are without transposons, or for genes that do not have any insertions at all, I want to determine a maximum width of the bars.
I thought maybe I could set the maximum width such that the probability of finding a single transposon in the region spanned by the bar is greater than 99% (based on the average distance between insertions in the whole chromosome). For this I am using a Poisson distribution (P(x,k)=x^k * e^(-x) / k! where x is the total number of insertions divided by the length of the chromosome and k is the actual number of events in the region spanned by the bar). To determine the length of the region that has a probability of 99% to have at least 1 insertion (i.e. 1 - the probability of having zero insertions in a region), I use the equation P(L)=1-e^(-x*L) (this is derived from the Poisson distribution). If we want P(L)=0.99, we can solve this equation to L.
For example, for one chromosome I calculated x=31, so L than becomes 144bp. This mean that we need to have a bar width of 144bp to have a probability of 99% to include a transposon. But I am not sure if this mathematically sound. Does anyone have another (better) solution for this?