*[CEGM1000 MUDE](http://mude.citg.tudelft.nl/): Week 2.8, January 17, 2025.*
%% Cell type:markdown id: tags:
In this notebook we will introduce a method for predicting the amount of profit that is possible for a given set of tickets selected, and set up a Monte Carlo Simulation to evaluate the distribution.
Profit, $B$, can be evaluated as follows _given that one of our $N_t$ tickets is the winning ticket_:
$$
B = \frac{W}{N_{w}} - 3N_t
$$
where:
- $B$ is profit (or benefit; we use $B$ to avoid confusion with $P$), in USD
- $W$ is the total payout, in USD
- $N_{w}$ is the number of winning tickets purchased (by other participants) for the winning minute, $m_w$
- $N_t$ is the number of tickets we purchase
- 3 is the cost per ticket (in USD)
Note in particular that the **number of (potential) winning tickets purchased by participants** has a big impact on the winnings. It also varies (quite significantly!) by minute, which can be denoted as $N_{w}(m)$, where $m$ is the minute. We don't have the time to go into the details here, but suffice to say that the distribution of $N_{w}(m)$ very much follows the historic average of the breakup day and minute (i.e., minutes in late april in the early after noon are _very_ popular), but also, various minutes throughout the day also vary significantly in popularity (e.g., minutes ending in 0 are much more popular than those ending in 2).
%% Cell type:code id: tags:
``` python
%load_extautoreload
%autoreload2
importos
importnumpyasnp
importmatplotlib.pyplotasplt
importscipy.statsasstats
fromtoolsimport*
pickle_path=os.path.join('pickles',
'tickets_per_minute.pkl')
withopen(pickle_path,'rb')asf:
loaded_radial_dist_list=pickle.load(f)
```
%% Output
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
%% Cell type:markdown id: tags:
## Part 1: Number of Tickets per Minute
The cell below provides code to generate a histogram of the number of tickets purchased per minute. There is a different distribution defined for each minute of the day.
Run the cell below and change the day and minute to see how the number of tickets varies per day. Try at least one minute in near the mode of the historic average breakup and one in the "unlikely" zone.
plt.title('Distribution of Number of Tickets Purchased for Specific Minute')
plt.legend()
plt.show()
trial_day=23
trial_min_in_day=13*60+30
trial_min=trial_day*24*60+trial_min_in_day
probabilities=loaded_radial_dist_list[trial_min]
sample_size=1000
sample=sample_integer(probabilities,sample_size)
plot_sample_distribution(sample,probabilities)
```
%% Output
%% Cell type:markdown id: tags:
## Part 2: Expected Value of Profit
Before running a Monte Carlo Simulation, let's get an idea for what we should expect using an expected value calculation. Using the equation provided at the beginning of the notebook, combined with the example code from the first notebook, you should be able to evaluate this for any combination of tickets.
Recall that the expectation of discrete events (such as winning tickets) is given by:
Technically, no, because you expect to make a profit. However, most of us probably would make the bet, as $3 is a small amount, and there is still the chance that you could win big!
</p>
</div>
%% Cell type:markdown id: tags:
## Part 3: Monte Carlo Simulation
Now the fun part! Rather than looking at a single point estimate of the profit, let's consider the whole _distribution._
The steps of the algorithm are:
1. Select tickets
2. Determine the number of winning tickets for each minute (defined by the tickets and the distributions from our pickle file)
3. Calculate the _winnings_
4. Calculate the _profit_
5. Evaluate the distributions
Note that in this case, the _winnings_ is just the first part of the profit equation above (without the ticket cost):
$$
\textrm{winnings } = \frac{W}{N_{w}}
$$
First we provide a function to help with plotting the results. You will probably want to modify this function later, or make your own, to better suit your needs in later parts of this GA.
<divstyle="background-color:#facb8e; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px; width: 95%"><p>We plot winnings because they are always positive and thus easier to visualize on the log-log scale.</p></div>
%% Cell type:markdown id: tags:
<divstyle="background-color:#facb8e; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px; width: 95%"><p><b>Note</b> that the analysis above has been done by calculating the winnings after sampling the number of other winners who chose each of the tickets we selected. This produces a distribution that is _conditional_ on one of the tickets selected being a winning ticket. This is why the probabilities in the plots seem high compared to the results found in the first notebook. The following analyses apply the probability of each ticket being a winning ticket, to properly account for this.</p></div>
%% Cell type:markdown id: tags:
And finally, a few summary statistics to help you interpret the results.