M12 Simulations
Topics in Insurance, Risk, and Finance
Introduction
Learning outcomes
- Understand why Excel is needed for evaluating cash series.
- Can describe the process of simulation.
- Understand the theoretical basis for simulations and the Monte Carlo estimator.
- Can compute the Monte Carlo estimator and its confidence interval.
- Implement simulations to model rates of return and calculate related risk metrics in Excel.
Why Excel
- We can use Excel (also other programming languages) to simulate random variables, and use these to generate random observations of cash flow series.
- These generations will result in an empirical distribution of the cash flow series. We can then use this distribution to estimate moments, quantiles, ranges, and probabilities.
- Simulation is widely used in the work by actuaries.
- Watching someone else use Excel has limited value, however, and so you must attempt these in your own time.
Theoretical preparation
Generating random numbers: uniform distribution
- A computer cannot generate true random numbers.
- Instead, Excel (and other programs) generate approximations that are called pseudorandom numbers.
- Once the parameter of the simulation is fixed, the same random samples will be generated each time.
- We will refer to pseudorandom numbers as random for brevity.
- There are multiple ways to generate random numbers in Excel.
- To generate random numbers from a uniform distribution on \((0,1)\), we can use the
RAND()(orRANDARRAY) function, which gives us realizations of independent and identically distributed uniform random variables. - Generating uniform random numbers is at the core of simulations.
Inverse transformation
Generating random numbers: inverse transformation
Therefore, we can use inverse transformation to generate random numbers of a distribution as long as the (generalized) inverse of the distribution is known (i.e., VaR).
That is, for a random variable \(X\), we first generate a sample of a standard uniform random variable \(u_1,\dots,u_n\). Then, by inverse transformation, a sample of \(X\sim F\) is \[F^{-1}(u_1),\dots,F^{-1}(u_n).\]
Note that not all distributions have nice formulas for their generalized inverse (e.g., normal distributions). One may need to rely on \(z\)-table for normal distributions.
Generating random numbers: inverse functions in Excel
- The inverse functions of many commonly used distributions are available in Excel.
- The general form of such inverse function is
F.INV(p,...): Ifp = F.DIST(x,...), thenF.INV(p,...) = x. - Here,
F.DISTis the distribution function.
Some inverse functions are summarized below
-
BETA.INVfor Beta distribution -
LOGNORM.INVfor lognormal distribution -
NORM.INVfor normal distribution -
NORM.S.INVfor standard normal distribution
To see more inverse/probability functions Excel, go to Formulas \(\rightarrow\) More Functions \(\rightarrow\) Statistical.
Generating random numbers: example A
Generate random numbers from normal distributions.
- To get a standard normal random number, we take the inverse cumulative normal of a uniform random variable in the range \((0, 1)\). In Excel,
NORMSINV(RAND())will generally do. - To get a large sample of random numbers, one can use
NORMSINV(RANDARRAY()). - Plot a histogram of the sample.
- To get a general normal random variable \(X \sim N(\mu, \sigma^2)\), from a standard normal random variable \(Z\), we simply take \[X =\mu+\sigma Z.\]
Generating random numbers: example B
Generate random numbers from a uniform distribution \(F\) on the interval \((a,b)\) where \(b>a\).
- We first find the inverse function of \(F\), which is \[F^{-1}(p)=a+(b-a)p.\]
- To get a random number from the uniform distribution, we use \(a+(b-a)\times\)
RAND(). - To get a large sample of random numbers, one can use \(a+(b-a)\times\)
RANDARRAY(). - Plot a histogram of the sample.
Generating random numbers: example C I
Generate random numbers from the following distribution \[ F(x)= \begin{cases} 1-\frac{1}{x+1},~~~0\le x<9,\\ 0.95,~~~9\le x<10,\\ 1,~~~x\ge 10. \end{cases} \]
Generating random numbers: example C II
We first find the inverse function \[ F^{-1}(p)= \begin{cases} \frac{1}{1-x}-1,~~~0< p\le0.9,\\ 9,~~~0.9< p\le0.95,\\ 10,~~~0.95<p\le 1. \end{cases} \]
Generating random numbers: example C III
- We first generate a random number from a standard uniform distribution using
RAND(or a sample usingRANDARRAY). Store it in A1. - To write the inverse function, we use the
IFfunction:IF(condition, value1 if the condition is true, value2 if the condition is false) - Logic operator:
AND(condition1,condition2),OR(condition1,condition2) - The inverse function can be written as
IF(A1<=0.9,1/(1-A1)-1,IF(AND(A1>0.9, A1<=0.95),9,10))
Monte Carlo estimator
- Given a sample of a random variable \(X\), we are now able to approximate \[\theta=\E(X).\]
- Let \(X_1,\dots,X_n\) be iid copies of \(X\). The Monte Carlo estimator of \(\theta\) can be obtained by \[\hat \theta=\frac{1}{n}\sum_{i=1}^nX_i.\]
- Note that this estimator also works for functions of random variables, i.e., \(f(X)\).
- If \(f(x)=x^k\), we can approximate the \(k\)th moment.
- If \(f(x)=\id_{\{x\in[a,b]\}}\) , we can approximate the probability of landing in \([a,b]\).
Theoretical basis of simulation: Why does it work?
\(\hat \theta\) is unbiased, i.e., \(\E(\hat \theta )=\theta\).
The Law of Large Numbers: for iid random variables \(X_1,\dots,X_n\), \[\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^nX_i\rightarrow \E(X_1)\] with probability 1. Moreover, for iid random variables \(Y_1,\dots,Y_n\), \[\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^nf(Y_i)\rightarrow \E(f(Y_1))\] with probability 1.
-
Also \[\var(\hat \theta)=\var\left(\frac{1}{n}\sum_{i=1}^nX_i\right)=\frac{1}{n}\var(X).\]
Hence, if \(n\) is large, \(\hat \theta\) can be very close to \(\theta\).
Theoretical basis of simulation: standard error
- By the Central Limit theorem, the distribution of \(\sum_{i=1}^nX_i/n\) can be approximated by \(N(\E(X_1),\var(X_1)/n)\).
- So the error of the simulation, i.e., \(\sum_{i=1}^nX_i/n-\E(X_1)\) is normally distributed with mean \(0\) and variance \(\var(X_1)/n\). The standard deviation \(\sqrt{\var(X_1)/n}\) is called the standard error.
- A problem is that we do not know the variance of \(X\) (even the expectation is unknown, otherwise simulation is redundant).
- For this, we can replace \(\var(X)\) by sample variance \[s_n^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\hat \theta)^2,\] which is an unbiased estimator of \(\var(X)\).
Theoretical basis of simulation: confidence interval I
- By the Law of Large numbers, \(s_n^2\rightarrow \var(X)\) with probability 1.
- Then with the Central Limit Theorem and Slutsky’s Theorem, we have the following result.
Theoretical basis of simulation: confidence interval II
By the previous theorem, a \(1-\alpha\) confidence interval of \(\theta\) is \[\left(\hat \theta -z_c\frac{s_n}{\sqrt n},\hat \theta +z_c\frac{s_n}{\sqrt n}\right)\] where \(\p(-z_c< Z< z_c)=1-\alpha\). Here \(Z\sim N(0,1)\).
Example: MC estimate I
Suppose that interest rates \(i_1,i_2,i_3\sim U(0,0.5)\) are independent. An agent simulates 3 paths of \(i_1,i_2,i_3\) using inverse transformation. The random numbers, generated from a standard uniform distribution, for \(i_1,i_2,i_3\) are documented sequentially below:Compute a Monte Carlo estimate of the mean of \(S_3=\prod_{t=1}^3(1+i_t)\) with a \(90\%\) confidence interval.
Example: MC estimate II
The simulated values of \(i_1,\dots,i_3\) are: Hence \(S_3\) in each path is:Example: MC estimate III
Therefore, the Monte Carlo estimate of \(\E(S_3)\) is \[\frac{1}{3}(1.629+1.679+1.687)=1.665,\] and \(s_n^2=0.000988\). The confidence interval is \[(1.665-1.65*0.01815, 1.665+1.65*0.01815).\]
Some basic Excel functions
Summary statistics
Some functions:
- mean:
AVERAGE() - variance of a sample:
VAR() - variance of a population:
VARP() - standard deviation of a sample:
STDEV() - standard deviation of a population:
STDEVP() - percentile:
PERCENTILE()
Analysis ToolPak
Load the Analysis ToolPak in Excel for Mac
- Click the Tools menu, and then click Excel Add-ins.
- In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
- If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
- If you get a prompt that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.
- Quit and restart Excel. Now the Data Analysis command is available on the Data tab.
Analysis ToolPak
Data Analysis can complete basic statistical tasks including:
- generate random numbers (we can use seed here)
- summary statistics
- frequency table
Exercises
Exercise A: \(S_n\) I
- Recall that if rates are iid, even if the accumulation factors do not follow a lognormal distribution, one can still use lognormal distribution to approximate \(S_n\) or \(V_n\).
- This can be seen by noting that \(\log S_n=\sum_{t=1}^n\log(1+i_t).\) By central limit theorem, \(\log S_n\) can be approximated by a normal distribution for large \(n\).
- Next, we will assess the performance of using lognormal distribution as an approximation.
Exercise A: \(S_n\) II
Suppose that in a varying rate model, \(i_1,\dots,i_{30}\sim U(0.06,0.10)\). Complete the following tasks:
- Generate a random sample of \((1+i_t)\) with size \(6000\times 30\).
- Compute the first and second moments of \(S_{30}\) numerically.
- Plot the mean of \(S_{30}\) against the simulation number \(k=1,\dots,6000\).
- Compute the first and second moments of \(S_{30}\) analytically and compare them with the numerical result.
- Use the analytical results for moments of \(\log S_{30}\) as the corresponding parameters of a lognormal distribution. Compare the histogram of the sample of \(S_{30}\) and the density of the lognormal distribution. Explain the observation.
- Use
FREQUENCYto generate frequency table given Bin series.
Compare the simulated results with lognormal distribution.
Exercise A: \(S_n\) III
We compute the first and second moments of \(S_{30}\) analytically.
Exercise A: \(S_n\) IV
Some observations:
- The mean and second moment of the sample are close to the analytical solution.
- As the simulation number increases, the mean of \(S_{30}\) becomes more and more stable and it is closer to the true value.
- The simulated distribution of \(S_{30}\) is very close to the lognormal distribution. This is due to the central limit theorem: as \(\log S_{30}\) is close to a normal distribution, \(S_{30}\) is close to a lognormal distribution.
Exercise A: \(S_n\) Remarks
Some remarks:
- To fill a column/row, use the fill function (Home \(\rightarrow\) Editing \(\rightarrow\) Fill \(\rightarrow\) Series).
- To select a large range of cells, use the name box in the top left of the workbook (e.g., A1:B2).
- When using the autofill function the columns should be adjacent.
- Although Excel states the lognormal function as
LOGNORM.DIST(x,mean,standard dev), the mean and standard deviation in this function refer to the equivalent normal distribution mean and standard deviation.
Exercise B: \(A_n\) I
Suppose that \(1+i_1,\dots,1+i_{10}\sim LN(0.09,0.04^2)\) are independent. Complete the following tasks.
- Calculate the mean and standard deviation of \(A_{10}\) using recursive formulae.
- Calculate the mean and standard deviation of \(A_{10}\) numerically using 5000 simulations.
- Estimate the \(5\%\) and \(95\%\) percentiles of \(A_{10}\).
- Estimate \(\p(A_{10}<14.75)\).
- Get summary statistics of \(A_{10}\).
Exercise B: \(A_n\) II
The recursive formulae: Since \(A_{n}=(1+i_n)(1+A_{n-1})\), we have \[\E(A_{n})=\E((1+i_n))(1+\E(A_{n-1})),\] and \[\E(A_{n}^2)=\E((1+i_n)^2)(1+2\E(A_{n-1})+\E(A_{n-1}^2)).\]
Simulations of the compound Poisson distribution
Complete the following tasks in Excel.
- Generate 500 samples from \(Poisson(\lambda)\) where \(\lambda=5,10,20,30\).
- Generate 500 samples from the compound Poisson distributions, for which the individual claim follows the exponential distribution with mean 1.
- Compute the sample means of the above simulations and compare with theoretical values.
- Draw histograms of the samples of the compound Poisson distributions for \(\lambda=5,10,20,30\). What is one key observation here?
Simulations
Assume that \(N(t)\) is a Poisson process with parameter \(30\), the individual claim amount distribution is lognormal with parameters \(\mu=3\) and \(\sigma^2=1.1\), \(c=1200\), and \(u=1000\). Find \(P(U(2)<0)\) by simulations.
Simulations
In a classical risk process,
- \(X_i\) follows the Gamma distribution with parameters \(\alpha=0.5\) and \(\beta=2\);
- average number of claims per unit of time is 1;
- premium loading factor is 0.1.
Find \(\psi_{4}(3.5)\) using simulations.
