DeepTriangle: A Deep Learning Approach to Loss Reserving

# DeepTriangle: A Deep Learning Approach to Loss Reserving
## 2018 Reserves Call Paper Program <a href = 'https://kevinykuo.com/talk/2018/09/clrs/'>kevinykuo.com/talk/2018/09/clrs/</a> <a href='https://arxiv.org/abs/1804.09253'>arXiv:1804.09253</a>
### Kevin Kuo @kevinykuo
### September 2018

---

# About me

.large[
- Not an actuary
  - Used to be one though, on paper, at least
- Was a "data scientist" when that was the cool thing to do
- Write open source R packages for a living
- Do research for fun ("nights & weekends")
]

---

# Do we need another reserving method?

---

# What's the motivation?

---

# State of the art

![](img/excel.png)

.Large[🤔]

---

# State of the art

`while (TRUE) {`

![](img/excel.png)

.Large[🤔]

`}`

---

# Since we're in the OC...

---

<img src="img/chopper1.jpg" width="100%" />
]

---

# What if we could automate the "boring stuff" so actuaries can "focus on insights"?

.Large[
Boring stuff?
Insights?
]

---

# How much should/could we automate? How much actuarial input should be in the loop?

---

# Actuaries on innovation...

---

From CAS [press release](https://www.casact.org/press/index.cfm?fa=viewArticle&articleID=2831):

.large[
> Survey participants were also asked to identify top statistical concepts that predictive modelers should understand. Generalized linear modeling (GLM), a **cutting-edge mathematical tool** on which modern ratemaking depends, was the top answer.

]

(bold mine)

---

```r
# Number of years a concept remains cutting-edge for actuaries
2015 - 1972
```

```
## [1] 43
```

---

## What if we could expose ourselves to quantitative techniques in other fields and think about applying them to actuarial work?

## Actually, we're doing that already! (E.g. compartmental models, machine learning on claims level data.)

## Are we doing enough as a community?

---

# How much of this R&D will actually make it "into production"?

# Are we doing fancy new tech for the sake of doing fancy new tech?

# Maybe we can extract tidbits from each new research project and synthesize them.

---

# We'll come back to this discussion, but let's talk about the paper now!

# We're not proposing a reserving framework to rule 'em all, but hopefully the ideas can nudge us towards one.

---

# Run-off triangles

Bet ya haven't seen this before

```
## # A tibble: 9 x 10
## # Groups: accident_year [9]
## accident_year `1` `2` `3` `4` `5` `6` `7` `8` `9`
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1988 133 200 98 139 45 0 0 -1 0
## 2 1989 934 812 619 214 184 203 -26 38 NA
## 3 1990 2030 2834 2016 1207 508 148 20 NA NA
## 4 1991 4537 6990 3596 1533 665 755 NA NA NA
## 5 1992 7564 8497 6404 2739 1313 NA NA NA NA
## 6 1993 8343 11557 6832 3347 NA NA NA NA NA
## 7 1994 12565 14357 6945 NA NA NA NA NA NA
## 8 1995 13437 12575 NA NA NA NA NA NA NA
## 9 1996 12604 NA NA NA NA NA NA NA NA
```

Yes, this is an incremental paid loss triangle

---

# Treating this as a predictive modeling problem

.large[
Each cell of the triangle corresponds to a row in our modeling dataset, and the response we're trying to predict is the *incremental* paid loss at that cell.

We just need to come up with some predictors
]

```
## # A tibble: 8 x 4
## # Groups: accident_year [8]
## accident_year development_lag incremental_paid_loss predictors 
## <int> <int> <dbl> <chr> 
## 1 1988 1 133 ?!?!?!?!?!?
## 2 1989 1 934 ?!?!?!?!?!?
## 3 1990 1 2030 ?!?!?!?!?!?
## 4 1991 1 4537 ?!?!?!?!?!?
## 5 1992 1 7564 ?!?!?!?!?!?
## 6 1993 1 8343 ?!?!?!?!?!?
## 7 1994 1 12565 ?!?!?!?!?!?
## 8 1995 1 13437 ?!?!?!?!?!?
```

```r
fancy_AI_algorithm(incremental_paid_loss ~ predictors, data = data)
```

---

# Crash course on "fancy AI algorithms"

---

# Deep learning

(if you're familiar with the jargon, we're going to focus on *supervised learning* here)
]

---

# Heuristics with some symbols

Basically, given some inputs `$X$`, we wanna predict `$Y$` via some function `$f$`, like

$$
\widehat{Y} = f(X)
$$

For neural nets, our function `$f$` looks something like

`$$f = f_{N}\circ f_{N-1}\circ\dots\circ f_1$$`

where `$f_1, \dots, f_N$` are functions parameterized by *weights*.

For the most part, you can think of these functions as matrix multiplication followed by some type of thresholding/nonlinearity.

(without the nonlinearity, our `$f$` would just be a simple linear transformation and that won't sound very impressive 🤔)

---

# Picture of a 1-layer network

---

# Each of the outputs can be a node

Stacking layers of nodes

---

# Successively distilling the input

---

# Learning

---

# Learning

---

# Learning

---

# Data

.large[
Schedule P data from [http://www.casact.org/research/index.cfm?fa=loss_reserves_data](http://www.casact.org/research/index.cfm?fa=loss_reserves_data).
]

.full-width[.content-box-green[.large[10 accident years (1988-1997) of paid and incurred losses, with 10 development lags, from a bunch of companies and lines of business.]]]

---

# Response & predictors

**Response**: 
- Incremental paid losses
- Total claims outstanding/case reserves

**Predictors**:
- Time series of paid losses and case reserves along accident year
- Company (because we're using data from multiple companies simultaneously)
]

We'll also divide the numerical values by the net premium associated with the AY.

---

# Predictor - experience

.large[For each cell in the triangle, we take the experience up to the previous calendar year. For example, for AY 1988 we have:]

```
## # A tibble: 9 x 3
## development_lag incremental_paid_loss paid_history 
## <int> <dbl> <chr> 
## 1 1 133 "" 
## 2 2 200 133 
## 3 3 98 133, 200 
## 4 4 139 133, 200, 98 
## 5 5 45 133, 200, 98, 139 
## 6 6 0 133, 200, 98, 139, 45 
## 7 7 0 133, 200, 98, 139, 45, 0 
## 8 8 -1 133, 200, 98, 139, 45, 0, 0 
## 9 9 0 133, 200, 98, 139, 45, 0, 0, -1
```

We also do the same thing to case reserves, so we'll have two time series for each cell we're predicting.

---

# Predictor - company code

In practice, we won't have our competitors' data. But if our book is composed of companies operating across different territories, using this variable gives us a way to *segment* our portfolio for reserving purposes.]

---

# Architecture

.Large[Note that we're predicting two quantities simultaneously, in one model. Also, we're combining different types of inputs: a couple time series and a categorical factor.]

---

# Embedding layer

.large[
The embedding layer maps each company code index to a fixed length vector. While the dimension stay the same, the actual values for each company are *learned* by the network during training, in order to optimize our objective.

For example, if the specified length is 5, company #2 might get mapped to `c(0.4, 1.2, -3.7, 3.3, 0.2)`.

We can think of this representation as a proxy for characteristics of the companies that are not captured by the time series data input, e.g. size of book, case reserving philosophy, etc.
]

---

# Neural network for sequences

.large[
Just like a vanilla feedforward neural network, except we feed the sequential input... in sequence.
]

---

# Helping RNN remember

Intuition: This Gated Recurrent Unit (GRU) architecture allows the network to "better" remember values from earlier on in the sequence.

<img src="img/gru.png" width="50%" style="display: block; margin: auto;" />
]

---

# Implementation in R

The model itself is under 30 lines of code! See [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle).

---

# Some results

Sample results from the company with the most data in the dataset...

---

# Some results

Workers' comp

---

# Benchmarking

Let's define a couple metrics to bench this new approach against existing methods:

`$$RMSPE_l = \sqrt{\frac{1}{|\mathcal{C}_l|}\sum_{C\in\mathcal{C}_l}\left(\frac{\widehat{UL}_C - UL_C)}{UL_C}\right)^2}$$`

and

`$$MAPE_l = \frac{1}{|\mathcal{C}_l|}\sum_{C\in\mathcal{C}_l}\left|\frac{\widehat{UL}_C - UL_C}{UL_C}\right|,$$`

where `$\mathcal{C}_l$` is the set of companies in line of business `$l$`, and `$\widehat{UL}_C$` and `$UL_C$` are the predicted and actual cumulative ultimate losses, respectively, for company `$C$`.

---

# Benchmarking

Results for other methods taken from [http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01](http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01).

---

# Discussion

- Predictions intervals for reserve variability.

- Claims level analytics, where we can take into account things like adjusters' notes and images.

- Policy level analytics, towards a holistic approach to pricing + reserving.

Slides: [kevinykuo.com/talk/2018/09/clrs/](https://kevinykuo.com/talk/2018/09/clrs/)

Paper: [arXiv:1804.09253](https://arxiv.org/abs/1804.09253)

Code: [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle)
]