Determine the minimal length of a genomic region that has a probability larger than P of having at least a single insertion.

Created by: Gregory94

I am developing a bar plot for showing the amount of reads for each transposon insertion in a gene. I have a question about determining the width of the bars. Currently I made the bars so wide that they include a fixed amount of insertions (e.g. each bar includes 8 transposon insertions). But for large regions that are without transposons, or for genes that do not have any insertions at all, I want to determine a maximum width of the bars.

I thought maybe I could set the maximum width such that the probability of finding a single transposon in the region spanned by the bar is greater than 99% (based on the average distance between insertions in the whole chromosome). For this I am using a Poisson distribution (P(x,k)=x^k * e^(-x) / k! where x is the total number of insertions divided by the length of the chromosome and k is the actual number of events in the region spanned by the bar). To determine the length of the region that has a probability of 99% to have at least 1 insertion (i.e. 1 - the probability of having zero insertions in a region), I use the equation P(L)=1-e^(-x*L) (this is derived from the Poisson distribution). If we want P(L)=0.99, we can solve this equation to L.

For example, for one chromosome I calculated x=31, so L than becomes 144bp. This mean that we need to have a bar width of 144bp to have a probability of 99% to include a transposon. But I am not sure if this mathematically sound. Does anyone have another (better) solution for this?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information