M12 Simulations

Topics in Insurance, Risk, and Finance

Author

Affiliation

Yuyu Chen

Department of Economics, University of Melbourne

Published

2026

Introduction

Learning outcomes

Understand why Excel is needed for evaluating cash series.
Can describe the process of simulation.
Understand the theoretical basis for simulations and the Monte Carlo estimator.
Can compute the Monte Carlo estimator and its confidence interval.
Implement simulations to model rates of return and calculate related risk metrics in Excel.

Why Excel

We can use Excel (also other programming languages) to simulate random variables, and use these to generate random observations of cash flow series.
These generations will result in an empirical distribution of the cash flow series. We can then use this distribution to estimate moments, quantiles, ranges, and probabilities.
Simulation is widely used in the work by actuaries.
Watching someone else use Excel has limited value, however, and so you must attempt these in your own time.

Theoretical preparation

Generating random numbers: uniform distribution

A computer cannot generate true random numbers.
Instead, Excel (and other programs) generate approximations that are called pseudorandom numbers.
Once the parameter of the simulation is fixed, the same random samples will be generated each time.
We will refer to pseudorandom numbers as random for brevity.
There are multiple ways to generate random numbers in Excel.
To generate random numbers from a uniform distribution on \((0,1)\), we can use the RAND() (or RANDARRAY) function, which gives us realizations of independent and identically distributed uniform random variables.
Generating uniform random numbers is at the core of simulations.

Inverse transformation

Generating random numbers: inverse transformation

Therefore, we can use inverse transformation to generate random numbers of a distribution as long as the (generalized) inverse of the distribution is known (i.e., VaR).
That is, for a random variable \(X\), we first generate a sample of a standard uniform random variable \(u_1,\dots,u_n\). Then, by inverse transformation, a sample of \(X\sim F\) is \[F^{-1}(u_1),\dots,F^{-1}(u_n).\]
Note that not all distributions have nice formulas for their generalized inverse (e.g., normal distributions). One may need to rely on \(z\)-table for normal distributions.

Generating random numbers: inverse functions in Excel

The inverse functions of many commonly used distributions are available in Excel.
The general form of such inverse function is F.INV(p,...): If p = F.DIST(x,...), then F.INV(p,...) = x.
Here, F.DIST is the distribution function.

Some inverse functions are summarized below

BETA.INV for Beta distribution
LOGNORM.INV for lognormal distribution
NORM.INV for normal distribution
NORM.S.INV for standard normal distribution

To see more inverse/probability functions Excel, go to Formulas \(\rightarrow\) More Functions \(\rightarrow\) Statistical.

Generating random numbers: example A

Generate random numbers from normal distributions.

To get a standard normal random number, we take the inverse cumulative normal of a uniform random variable in the range \((0, 1)\). In Excel, NORMSINV(RAND()) will generally do.
To get a large sample of random numbers, one can use NORMSINV(RANDARRAY()).
Plot a histogram of the sample.
To get a general normal random variable \(X \sim N(\mu, \sigma^2)\), from a standard normal random variable \(Z\), we simply take \[X =\mu+\sigma Z.\]

Generating random numbers: example B

Generate random numbers from a uniform distribution \(F\) on the interval \((a,b)\) where \(b>a\).

We first find the inverse function of \(F\), which is \[F^{-1}(p)=a+(b-a)p.\]
To get a random number from the uniform distribution, we use \(a+(b-a)\times\) RAND().
To get a large sample of random numbers, one can use \(a+(b-a)\times\) RANDARRAY().
Plot a histogram of the sample.

Generating random numbers: example C I

Generate random numbers from the following distribution \[ F(x)= \begin{cases} 1-\frac{1}{x+1},~~~0\le x<9,\\ 0.95,~~~9\le x<10,\\ 1,~~~x\ge 10. \end{cases} \]

Generating random numbers: example C II

We first find the inverse function \[ F^{-1}(p)= \begin{cases} \frac{1}{1-x}-1,~~~0< p\le0.9,\\ 9,~~~0.9< p\le0.95,\\ 10,~~~0.95<p\le 1. \end{cases} \]

Generating random numbers: example C III

We first generate a random number from a standard uniform distribution using RAND (or a sample using RANDARRAY). Store it in A1.
To write the inverse function, we use the IF function: IF(condition, value1 if the condition is true, value2 if the condition is false)
Logic operator: AND(condition1,condition2), OR(condition1,condition2)
The inverse function can be written as IF(A1<=0.9,1/(1-A1)-1,IF(AND(A1>0.9, A1<=0.95),9,10))

Monte Carlo estimator

Given a sample of a random variable \(X\), we are now able to approximate \[\theta=\E(X).\]
Let \(X_1,\dots,X_n\) be iid copies of \(X\). The Monte Carlo estimator of \(\theta\) can be obtained by \[\hat \theta=\frac{1}{n}\sum_{i=1}^nX_i.\]
Note that this estimator also works for functions of random variables, i.e., \(f(X)\).
If \(f(x)=x^k\), we can approximate the \(k\)th moment.
If \(f(x)=\id_{\{x\in[a,b]\}}\) , we can approximate the probability of landing in \([a,b]\).

Theoretical basis of simulation: Why does it work?

\(\hat \theta\) is unbiased, i.e., \(\E(\hat \theta )=\theta\).
The Law of Large Numbers: for iid random variables \(X_1,\dots,X_n\), \[\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^nX_i\rightarrow \E(X_1)\] with probability 1. Moreover, for iid random variables \(Y_1,\dots,Y_n\), \[\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^nf(Y_i)\rightarrow \E(f(Y_1))\] with probability 1.
Also \[\var(\hat \theta)=\var\left(\frac{1}{n}\sum_{i=1}^nX_i\right)=\frac{1}{n}\var(X).\]

Hence, if \(n\) is large, \(\hat \theta\) can be very close to \(\theta\).

Theoretical basis of simulation: standard error

By the Central Limit theorem, the distribution of \(\sum_{i=1}^nX_i/n\) can be approximated by \(N(\E(X_1),\var(X_1)/n)\).
So the error of the simulation, i.e., \(\sum_{i=1}^nX_i/n-\E(X_1)\) is normally distributed with mean \(0\) and variance \(\var(X_1)/n\). The standard deviation \(\sqrt{\var(X_1)/n}\) is called the standard error.
A problem is that we do not know the variance of \(X\) (even the expectation is unknown, otherwise simulation is redundant).
For this, we can replace \(\var(X)\) by sample variance \[s_n^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\hat \theta)^2,\] which is an unbiased estimator of \(\var(X)\).

Theoretical basis of simulation: confidence interval I

By the Law of Large numbers, \(s_n^2\rightarrow \var(X)\) with probability 1.
Then with the Central Limit Theorem and Slutsky’s Theorem, we have the following result.

Theoretical basis of simulation: confidence interval II

By the previous theorem, a \(1-\alpha\) confidence interval of \(\theta\) is \[\left(\hat \theta -z_c\frac{s_n}{\sqrt n},\hat \theta +z_c\frac{s_n}{\sqrt n}\right)\] where \(\p(-z_c< Z< z_c)=1-\alpha\). Here \(Z\sim N(0,1)\).

Example: MC estimate I

Suppose that interest rates \(i_1,i_2,i_3\sim U(0,0.5)\) are independent. An agent simulates 3 paths of \(i_1,i_2,i_3\) using inverse transformation. The random numbers, generated from a standard uniform distribution, for \(i_1,i_2,i_3\) are documented sequentially below:

Compute a Monte Carlo estimate of the mean of \(S_3=\prod_{t=1}^3(1+i_t)\) with a \(90\%\) confidence interval.

Example: MC estimate II

The simulated values of \(i_1,\dots,i_3\) are: Hence \(S_3\) in each path is:

Example: MC estimate III

Therefore, the Monte Carlo estimate of \(\E(S_3)\) is \[\frac{1}{3}(1.629+1.679+1.687)=1.665,\] and \(s_n^2=0.000988\). The confidence interval is \[(1.665-1.65*0.01815, 1.665+1.65*0.01815).\]

Some basic Excel functions

Summary statistics

Some functions:

mean: AVERAGE()
variance of a sample: VAR()
variance of a population: VARP()
standard deviation of a sample: STDEV()
standard deviation of a population: STDEVP()
percentile: PERCENTILE()

Analysis ToolPak

Load the Analysis ToolPak in Excel for Mac

Click the Tools menu, and then click Excel Add-ins.
In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
If you get a prompt that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.
Quit and restart Excel. Now the Data Analysis command is available on the Data tab.

Analysis ToolPak

Data Analysis can complete basic statistical tasks including:

generate random numbers (we can use seed here)
summary statistics
frequency table

Exercises

Exercise A: \(S_n\) I

Recall that if rates are iid, even if the accumulation factors do not follow a lognormal distribution, one can still use lognormal distribution to approximate \(S_n\) or \(V_n\).
This can be seen by noting that \(\log S_n=\sum_{t=1}^n\log(1+i_t).\) By central limit theorem, \(\log S_n\) can be approximated by a normal distribution for large \(n\).
Next, we will assess the performance of using lognormal distribution as an approximation.

Exercise A: \(S_n\) II

Suppose that in a varying rate model, \(i_1,\dots,i_{30}\sim U(0.06,0.10)\). Complete the following tasks:

Generate a random sample of \((1+i_t)\) with size \(6000\times 30\).
Compute the first and second moments of \(S_{30}\) numerically.
Plot the mean of \(S_{30}\) against the simulation number \(k=1,\dots,6000\).
Compute the first and second moments of \(S_{30}\) analytically and compare them with the numerical result.
Use the analytical results for moments of \(\log S_{30}\) as the corresponding parameters of a lognormal distribution. Compare the histogram of the sample of \(S_{30}\) and the density of the lognormal distribution. Explain the observation.
Use FREQUENCY to generate frequency table given Bin series.

Compare the simulated results with lognormal distribution.

Exercise A: \(S_n\) III

We compute the first and second moments of \(S_{30}\) analytically.

Exercise A: \(S_n\) IV

Some observations:

The mean and second moment of the sample are close to the analytical solution.
As the simulation number increases, the mean of \(S_{30}\) becomes more and more stable and it is closer to the true value.
The simulated distribution of \(S_{30}\) is very close to the lognormal distribution. This is due to the central limit theorem: as \(\log S_{30}\) is close to a normal distribution, \(S_{30}\) is close to a lognormal distribution.

Exercise A: \(S_n\) Remarks

Some remarks:

To fill a column/row, use the fill function (Home \(\rightarrow\) Editing \(\rightarrow\) Fill \(\rightarrow\) Series).
To select a large range of cells, use the name box in the top left of the workbook (e.g., A1:B2).
When using the autofill function the columns should be adjacent.
Although Excel states the lognormal function as LOGNORM.DIST(x,mean,standard dev), the mean and standard deviation in this function refer to the equivalent normal distribution mean and standard deviation.

Exercise B: \(A_n\) I

Suppose that \(1+i_1,\dots,1+i_{10}\sim LN(0.09,0.04^2)\) are independent. Complete the following tasks.

Calculate the mean and standard deviation of \(A_{10}\) using recursive formulae.
Calculate the mean and standard deviation of \(A_{10}\) numerically using 5000 simulations.
Estimate the \(5\%\) and \(95\%\) percentiles of \(A_{10}\).
Estimate \(\p(A_{10}<14.75)\).
Get summary statistics of \(A_{10}\).

Exercise B: \(A_n\) II

The recursive formulae: Since \(A_{n}=(1+i_n)(1+A_{n-1})\), we have \[\E(A_{n})=\E((1+i_n))(1+\E(A_{n-1})),\] and \[\E(A_{n}^2)=\E((1+i_n)^2)(1+2\E(A_{n-1})+\E(A_{n-1}^2)).\]

Simulations of the compound Poisson distribution

Complete the following tasks in Excel.

Generate 500 samples from \(Poisson(\lambda)\) where \(\lambda=5,10,20,30\).
Generate 500 samples from the compound Poisson distributions, for which the individual claim follows the exponential distribution with mean 1.
Compute the sample means of the above simulations and compare with theoretical values.
Draw histograms of the samples of the compound Poisson distributions for \(\lambda=5,10,20,30\). What is one key observation here?

Simulations

Assume that \(N(t)\) is a Poisson process with parameter \(30\), the individual claim amount distribution is lognormal with parameters \(\mu=3\) and \(\sigma^2=1.1\), \(c=1200\), and \(u=1000\). Find \(P(U(2)<0)\) by simulations.

Simulations

In a classical risk process,

\(X_i\) follows the Gamma distribution with parameters \(\alpha=0.5\) and \(\beta=2\);
average number of claims per unit of time is 1;
premium loading factor is 0.1.

Find \(\psi_{4}(3.5)\) using simulations.