Friday 1 December 2017

Generalized Linear Model - Part 1 - Multiple Regression Approach

1. A dynamic pricing system can be built for personal lines business, whereby profit loads and risk premiums can be tailored to the individual behavioural characteristics of the customer. 

2. The objective is to use as much information as input to these models in order to establish which risk factors are the most predictive.


TRADITIONAL RATING VS MULTIPLE REGRESSION APPROACH
1.The traditional way to determine relativities by rating factor is to look at series of one-way tables, either focusing on the relative risk premiums or the relative loss ratios. 

Posts on one way tables for incurred & paid claims

2.  Multiple regression approach removes any distortions caused by different mixes of business.

3. The benefit of using GLMs is that the models are formulated within a statistical framework. This allows standard statistical tests (such as Z 2 tests and F tests) to be used for comparing models.


CATEGORICAL AND CONTINUOUS VARIABLES
1. The use of categorical variables allows separate parameters to be fitted for each level of a rating factor.

2. Factors that have a natural continuous scale can also be treated as categorical variables. For example, a policyholder age scale can be grouped by bands of ages, and each band considered to be a categorical level. 

3. For example, if we are fitting a model with two factors, the first being territory (with levels A, B and C), and the second being policyholder age (grouped into <25, 25-39, 40-59, 60+), then the design matrix would look something like: 




Note: 

(i). The columns then take on the value 1 or 0, depending on whether the particular observation contains that level or not. If we included a column for each level of the rating factor, the design matrix would contain a linear dependency (because for any one categorical factor the sum of all levels would sum to 1), and so one level is omitted from each categorical variable. This level is known as the base level; We usually choose this level to be the one with the greatest volume of business. 

(ii). The base level for Teritory has been chosen as level A, and the base level for policyholder age has been chosen at level 40-59. The choice of base is arbitrary, but is often taken to be the level with the most numbers of observations.

4. If the average policyholder ages within each level are 22, 32, 50 and 69 then the design matrix changes to: 



GENERALIZED LINEAR MODEL OUTPUT
1. Below is an extract from the output of a generalized linear model fitted to an average cost model.




Note: The "Standard Error" column gives us an idea of the variability of the parameter estimates. 

2. The "mean" parameter value (6.3092) represents the intercept parameter, and the exponential of this number (549.6) is the fitted value for a base risk. For this particular model, a base risk is a male aged 36 to 40 (plus appropriate bases for the factors not shown in the extract). 

3. To derive the fitted value for a non base risk, simply add up the parameters for the appropriate levels and then take the exponential of this number. For example, the fitted value for a female age 23 is exp(6.3092 + 0.0091 + 0.1135) = 621.3. We can derive the relative claims experience for any one factor by simply looking at the exponential of the parameter. For example, the female relativity is 1.0091 (=exp(0.0091 )) compared to males. 


THOUGHTS
1. The techniques can be reverse-engineered to provide customer value models in markets where rates are controlled. 

2. Brockman and Wright recommend that the risk premium is split down into frequency and severity (average cost) by each type of claim covered under the policy. The reasons for doing this include:

(i) Response variables for frequency and severity follow different statistical distributions. Usually a Poisson error structure is used for a frequency model and a gamma error structure for a severity model. A log link function is usually used for both frequency and severity.

(ii) A greater insight into the underlying cause of claims experience variability is provided

(iii) Certain models are inherently more volatile that others. For example, the average cost of liability claims is likely to be much more volatile than the frequency of own damage claims. By modelling total risk premium rather than splitting it into its constituent parts, we would not be able to identify whether an apparently anomalous trend is the result of a random fluctuation in liability average cost or a genuine trend in the own damage frequency.

(Source: Karl P Murphy, Michael J Brockman, Peter K W Lee)