- Magnus Skonberg (1.1)
September 8, 2020
There are two key properties of probability models:
This semester we will examine two interpretations of probabilty:
Frequentist interpretation: The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.
Bayesian interpretation: A Bayesian interprets probability as a subjective degree of belief: For the same event, two separate people could have different viewpoints and so assign different probabilities. Largely popularized by revolutionary advance in computational technology and methods during the last twenty years.
Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, \({\hat{p}}_n\), converges to the probability of that outcome, \(p\).
When tossing a fair coin, if heads comes up on each of the first 10 tosses, what do you think the chance is that another head will come up on the next coin toss? 0.5, less 0.5, or greater 0.5?
When tossing a fair coin, if heads comes up on each of the first 10 tosses, what do you think the chance is that another head will come up on the next coin toss? 0.5, less 0.5, or greater 0.5?
library(DATA606) shiny_demo('gambler')
coins <- sample(c(-1,1), 1000, replace=TRUE) plot(1:length(coins), cumsum(coins), type='l') abline(h=0)
plot(1:length(coins), cumsum(coins), type='l', ylim=c(-1000, 1000)) abline(h=0)
Disjoint (mutually exclusive) outcomes: Cannot happen at the same time.
Non-disjoint outcomes: Can happen at the same time.
A probability distribution lists all possible events and the probabilities with which they occur.
Event | Male | Female |
---|---|---|
Probabilty | 0.5 | 0.5 |
Rules for probability distributions:
The probability distribution for the genders of two kids:
Event | MM | FF | MF | FM |
---|---|---|---|---|
Probability | 0.25 | 0.25 | 0.25 | 0.25 |
Two processes are independent if knowing the outcome of one provides no useful information about the outcome of the other.
If P(A occurs, given that B is true) = P(A | B) = P(A), then A and B are independent.
P(protects citizens) varies by race/ethnicity, therefore opinion on gun ownership and race ethnicity are most likely dependent.
shiny_demo('lottery')
A random variable is a numeric quantity whose value depends on the outcome of a random event
There are two types of random variables:
\[ \mu =E\left( X \right) =\sum _{ i=1 }^{ k }{ { x }_{ i }P\left( X={ x }_{ i } \right) } \]
In a game of cards you win $1 if you draw a heart, $5 if you draw an ace (including the ace of hearts), $10 if you draw the king of spades and nothing for any other card you draw. Write the probability model for your winnings, and calculate your expected winning.
Event | X | P(X) | X P(X) |
---|---|---|---|
Heart (not Ace) | 1 | 12/52 | 12/52 |
Ace | 5 | 4/52 | 20/52 |
King of Spades | 10 | 1/52 | 10/52 |
All else | 0 | 35/52 | 0 |
Total | \(E(X) = \frac{42}{52} \approx 0.81\) |
cards <- data.frame(Event = c('Heart (not ace)','Ace','King of Spades','All else'), X = c(1, 5, 10, 0), pX = c(12/52, 5/52, 1/52, 32/52) ) cards$XpX <- cards$X * cards$pX cards2 <- rep(0, 11) cards2[cards$X + 1] <- cards$pX names(cards2) <- 0:10 barplot(cards2, main='Probability of Winning Game')
tickets <- as.data.frame(rbind( c( '$1', 1, 15), c( '$2', 2, 11), c( '$4', 4, 62), c( '$5', 5, 100), c( '$10', 10, 143), c( '$20', 20, 250), c( '$30', 30, 562), c( '$50', 50, 3482), c( '$100', 100, 6681), c( '$500', 500, 49440), c('$1500', 1500, 375214), c('$2500', 2500, 618000) ), stringsAsFactors=FALSE) names(tickets) <- c('Winnings', 'Value', 'Odds') tickets$Value <- as.integer(tickets$Value) tickets$Odds <- as.integer(tickets$Odds)
odds <- sample(max(tickets$Odds), 1000, replace=TRUE) vals <- rep(-1, length(odds)) for(i in 1:nrow(tickets)) { vals[odds %% tickets[i,'Odds'] == 0] <- tickets[i,'Value'] - 1 } head(vals, n=20)
## [1] -1 -1 -1 1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 -1 3 -1 -1 -1
mean(vals)
## [1] -0.533
ggplot(data.frame(Winnings=vals), aes(x=Winnings)) + geom_bar(binwidth=1)
## Warning: Ignoring unknown parameters: binwidth
\[ \mu =E\left( X \right) =\sum _{ i=1 }^{ k }{ { x }_{ i }P\left( X={ x }_{ i } \right) } \]
tickets
## Winnings Value Odds xPx ## 1 $1 1 15 0.066666667 ## 2 $2 2 11 0.181818182 ## 3 $4 4 62 0.064516129 ## 4 $5 5 100 0.050000000 ## 5 $10 10 143 0.069930070 ## 6 $20 20 250 0.080000000 ## 7 $30 30 562 0.053380783 ## 8 $50 50 3482 0.014359563 ## 9 $100 100 6681 0.014967819 ## 10 $500 500 49440 0.010113269 ## 11 $1500 1500 375214 0.003997719 ## 12 $2500 2500 618000 0.004045307
sum(tickets$xPx) - 1 # Expected value for one ticket
## [1] -0.3862045
sum(tickets$xPx) - 1 # Expected value for one ticket
## [1] -0.3862045
Simulated
nGames <- 1 runs <- numeric(10000) for(j in seq_along(runs)) { odds <- sample(max(tickets$Odds), nGames, replace = TRUE) vals <- rep(-1, length(odds)) for(i in 1:nrow(tickets)) { vals[odds %% tickets[i,'Odds'] == 0] <- tickets[i,'Value'] - 1 } runs[j] <- cumsum(vals)[nGames] } mean(runs)
## [1] -0.4385