class: center, middle, inverse, title-slide # DeepTriangle: A Deep Learning Approach to Loss Reserving ## 2018 Reserves Call Paper Program
kevinykuo.com/talk/2018/09/clrs/
arXiv:1804.09253
### Kevin Kuo
@kevinykuo
### September 2018 --- # About me .large[ - Not an actuary - Used to be one though, on paper, at least - Was a "data scientist" when that was the cool thing to do - Write open source R packages for a living - Do research for fun ("nights & weekends") ] --- class: center, middle # Do we need another reserving method? --- class: center, middle # What's the motivation? --- # State of the art <!-- --> -- .Large[🤔] --- # State of the art `while (TRUE) {` <!-- --> .Large[🤔] `}` --- class: center, middle # Since we're in the OC... --- .pull-left[ The cynical view <img src="img/chopper1.jpg" width="100%" /> ] -- .pull-right[ <img src="img/chopper2a.jpg" width="75%" /> ] --- class: center, middle, inverse # What if we could automate the "boring stuff" so actuaries can "focus on insights"? -- .Large[ Boring stuff? Insights? ] --- class: center, middle # How much should/could we automate? How much actuarial input should be in the loop? --- class: center, middle, inverse # Actuaries on innovation... --- From CAS [press release](https://www.casact.org/press/index.cfm?fa=viewArticle&articleID=2831): <img src="img/cas_title.png" width="75%" /> .large[ > Survey participants were also asked to identify top statistical concepts that predictive modelers should understand. Generalized linear modeling (GLM), a **cutting-edge mathematical tool** on which modern ratemaking depends, was the top answer. ] (bold mine) --- class: center, middle <img src="img/glm_abstract.png" width="55%" /> -- ```r # Number of years a concept remains cutting-edge for actuaries 2015 - 1972 ``` ``` ## [1] 43 ``` --- class: center, middle ## What if we could expose ourselves to quantitative techniques in other fields and think about applying them to actuarial work? -- ## Actually, we're doing that already! (E.g. compartmental models, machine learning on claims level data.) -- ## Are we doing enough as a community? --- class: center, middle, inverse # How much of this R&D will actually make it "into production"? -- # Are we doing fancy new tech for the sake of doing fancy new tech? -- # Maybe we can extract tidbits from each new research project and synthesize them. --- class: center, middle # We'll come back to this discussion, but let's talk about the paper now! -- # We're not proposing a reserving framework to rule 'em all, but hopefully the ideas can nudge us towards one. --- # Run-off triangles Bet ya haven't seen this before ``` ## # A tibble: 9 x 10 ## # Groups: accident_year [9] ## accident_year `1` `2` `3` `4` `5` `6` `7` `8` `9` ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1988 133 200 98 139 45 0 0 -1 0 ## 2 1989 934 812 619 214 184 203 -26 38 NA ## 3 1990 2030 2834 2016 1207 508 148 20 NA NA ## 4 1991 4537 6990 3596 1533 665 755 NA NA NA ## 5 1992 7564 8497 6404 2739 1313 NA NA NA NA ## 6 1993 8343 11557 6832 3347 NA NA NA NA NA ## 7 1994 12565 14357 6945 NA NA NA NA NA NA ## 8 1995 13437 12575 NA NA NA NA NA NA NA ## 9 1996 12604 NA NA NA NA NA NA NA NA ``` -- Yes, this is an incremental paid loss triangle --- # Treating this as a predictive modeling problem .large[ Each cell of the triangle corresponds to a row in our modeling dataset, and the response we're trying to predict is the *incremental* paid loss at that cell. We just need to come up with some predictors ] ``` ## # A tibble: 8 x 4 ## # Groups: accident_year [8] ## accident_year development_lag incremental_paid_loss predictors ## <int> <int> <dbl> <chr> ## 1 1988 1 133 ?!?!?!?!?!? ## 2 1989 1 934 ?!?!?!?!?!? ## 3 1990 1 2030 ?!?!?!?!?!? ## 4 1991 1 4537 ?!?!?!?!?!? ## 5 1992 1 7564 ?!?!?!?!?!? ## 6 1993 1 8343 ?!?!?!?!?!? ## 7 1994 1 12565 ?!?!?!?!?!? ## 8 1995 1 13437 ?!?!?!?!?!? ``` .large[Then we can do something like] ```r fancy_AI_algorithm(incremental_paid_loss ~ predictors, data = data) ``` --- class: center, middle, inverse # Crash course on "fancy AI algorithms" --- # Deep learning .large[Really just a rebranding of neural networks...] -- .large[...which is a machine learning technique] -- .large[...i.e. it's a way to learn a function that takes some inputs to predict some outputs (if you're familiar with the jargon, we're going to focus on *supervised learning* here) ] --- # Heuristics with some symbols Basically, given some inputs `\(X\)`, we wanna predict `\(Y\)` via some function `\(f\)`, like $$ \widehat{Y} = f(X) $$ -- For neural nets, our function `\(f\)` looks something like `$$f = f_{N}\circ f_{N-1}\circ\dots\circ f_1$$` where `\(f_1, \dots, f_N\)` are functions parameterized by *weights*. -- For the most part, you can think of these functions as matrix multiplication followed by some type of thresholding/nonlinearity. -- (without the nonlinearity, our `\(f\)` would just be a simple linear transformation and that won't sound very impressive 🤔) --- # Picture of a 1-layer network <img src="img/nn-basics-1.jpeg" width="80%" /> --- # Each of the outputs can be a node Stacking layers of nodes <img src="img/nn-basics-2.png" width="70%" /> --- # Successively distilling the input <img src="img/mnist_representations.png" width="80%" /> --- # Learning <img src="img/deep-learning-in-3-figures-1.png" width="80%" /> --- # Learning <img src="img/deep-learning-in-3-figures-2.png" width="60%" /> --- # Learning <img src="img/deep-learning-in-3-figures-3.png" width="60%" /> --- # Data .large[ Schedule P data from [http://www.casact.org/research/index.cfm?fa=loss_reserves_data](http://www.casact.org/research/index.cfm?fa=loss_reserves_data). ] .full-width[.content-box-green[.large[10 accident years (1988-1997) of paid and incurred losses, with 10 development lags, from a bunch of companies and lines of business.]]] .large[For our study, we're taking 4 LOBs x 50 triangles.] --- # Response & predictors .large[ - We'll be building one model for each LOB. **Response**: - Incremental paid losses - Total claims outstanding/case reserves **Predictors**: - Time series of paid losses and case reserves along accident year - Company (because we're using data from multiple companies simultaneously) ] We'll also divide the numerical values by the net premium associated with the AY. --- # Predictor - experience .large[For each cell in the triangle, we take the experience up to the previous calendar year. For example, for AY 1988 we have:] ``` ## # A tibble: 9 x 3 ## development_lag incremental_paid_loss paid_history ## <int> <dbl> <chr> ## 1 1 133 "" ## 2 2 200 133 ## 3 3 98 133, 200 ## 4 4 139 133, 200, 98 ## 5 5 45 133, 200, 98, 139 ## 6 6 0 133, 200, 98, 139, 45 ## 7 7 0 133, 200, 98, 139, 45, 0 ## 8 8 -1 133, 200, 98, 139, 45, 0, 0 ## 9 9 0 133, 200, 98, 139, 45, 0, 0, -1 ``` -- We also do the same thing to case reserves, so we'll have two time series for each cell we're predicting. --- # Predictor - company code .large[We consider multiple companies simultaneously when we train our models. In practice, we won't have our competitors' data. But if our book is composed of companies operating across different territories, using this variable gives us a way to *segment* our portfolio for reserving purposes.] --- # Architecture <img src="img/nn1.png" width="20%" style="display: block; margin: auto;" /> .Large[Note that we're predicting two quantities simultaneously, in one model. Also, we're combining different types of inputs: a couple time series and a categorical factor.] --- # Embedding layer .large[ The embedding layer maps each company code index to a fixed length vector. While the dimension stay the same, the actual values for each company are *learned* by the network during training, in order to optimize our objective. For example, if the specified length is 5, company #2 might get mapped to `c(0.4, 1.2, -3.7, 3.3, 0.2)`. We can think of this representation as a proxy for characteristics of the companies that are not captured by the time series data input, e.g. size of book, case reserving philosophy, etc. ] --- # Neural network for sequences .large[ Just like a vanilla feedforward neural network, except we feed the sequential input... in sequence. ] <img src="img/rnn.png" width="50%" style="display: block; margin: auto;" /> --- # Helping RNN remember .large[ Obligatory complicated recurrent network figure! (don't worry about the details) Intuition: This Gated Recurrent Unit (GRU) architecture allows the network to "better" remember values from earlier on in the sequence. <img src="img/gru.png" width="50%" style="display: block; margin: auto;" /> ] --- # Implementation in R The model itself is under 30 lines of code! See [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle). <img src="img/model-code.png" width="50%" /> --- # Some results Sample results from the company with the most data in the dataset... <img src="img/ppauto-results.png" width="55%" style="display: block; margin: auto;" /> --- # Some results Workers' comp <img src="img/wkcomp-results.png" width="55%" style="display: block; margin: auto;" /> --- # Benchmarking Let's define a couple metrics to bench this new approach against existing methods: `$$RMSPE_l = \sqrt{\frac{1}{|\mathcal{C}_l|}\sum_{C\in\mathcal{C}_l}\left(\frac{\widehat{UL}_C - UL_C)}{UL_C}\right)^2}$$` and `$$MAPE_l = \frac{1}{|\mathcal{C}_l|}\sum_{C\in\mathcal{C}_l}\left|\frac{\widehat{UL}_C - UL_C}{UL_C}\right|,$$` where `\(\mathcal{C}_l\)` is the set of companies in line of business `\(l\)`, and `\(\widehat{UL}_C\)` and `\(UL_C\)` are the predicted and actual cumulative ultimate losses, respectively, for company `\(C\)`. --- # Benchmarking Results for other methods taken from [http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01](http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01). <img src="img/comparison_table.png" width="70%" style="display: block; margin: auto;" /> --- # Discussion .large[ Future work? - Predictions intervals for reserve variability. - Claims level analytics, where we can take into account things like adjusters' notes and images. - Policy level analytics, towards a holistic approach to pricing + reserving. Slides: [kevinykuo.com/talk/2018/09/clrs/](https://kevinykuo.com/talk/2018/09/clrs/) Paper: [arXiv:1804.09253](https://arxiv.org/abs/1804.09253) Code: [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle) ]