Assessing the performance of a model

Lorenz curve

The Lorenz curve describes the quality of the model's predictions. It is computed by:

Sorting all the observations, from the highest prediction to the lowest;
Plotting the cumulative observed value, along the sorted predictions.

The curve represents thus a measure of segmentation of the portfolio under analysis.

Example: Thinking in terms of premiums, the graph below allows to draw conclusions similar to:

the cumulative 20% of the highest premiums holds around 32% of the losses

...or more generally:

the cumulative 20% of the highest predicted risk actually represent around 32% of the observed risk

Which hints at a favorable segmentation of our portfolio by the model. Lorenz curves are built on the test data from the Cross Validation, so that it reflects the Out of Sample performance of the model.

Lift curve

The Lift curve is built by sorting prediction from lowest to highest and bucketing them into 20 groups that represent 5% of the predictions each.

For each bucket we then display the average prediction from the model and the average observation, in order to assess the quality of the model's predictions. Around the averages we display error bars to visualize variability across the different folds.

Note: Lift curves are built on the test data from the Cross Validation, so that it reflects the Out of Sample performance of the model.

ROC curve

The ROC curve allows us to understand Propensity-Logistic models. Before detailing the ROC curve, we recall some useful definitions for a logistic model.

The ROC curve relies on the notion of classification threshold. We define a classification threshold as the threshold above which we will consider predictions as positive outcomes and below which we will consider predictions as negative outcomes.

A positive outcome is a prediction associated with the value TRUE of a binary variable i.e the value 1. A negative outcome is by opposition a prediction associated with the value FALSE of a binary variable i.e the value 0. A TRUE POSITIVE (TP) is an outcome where the model correctly predicts the positive class. Similarly, a FALSE POSITIVE (FP) is an outcome where the model incorrectly predicts TRUE when the actual response is FALSE. The definitions of the TRUE NEGATIVE (TN) and FALSE NEGATIVE (FN) follow immediately.

Using these definitions, we can express meaningful ratios. On one side, the TRUE POSITIVE RATE (TPR) expresses the percentage of correct positive predictions:

On the other side, the FALSE POSITIVE RATE (FPR) expresses the percentage of incorrect positive predictions:

Thus, a perfect utopian model which correctly predicts all members of the positive class and all members of the negative class will have a TPR equals to 1 and a FPR equals to 0.

On the graph below, we see that there exists a threshold for which we can have a TPR of 90% and a FPR of 60%.

Note: The reading of the graph does not allow to determine what the threshold is exactly, we just know that it exists.

The AUC metric, which is computed as the area under the ROC curve, is commonly used as a performance metric for classification models. More information can be found on the Wikipedia entry for the Receiver operating characteristic.

A perfect utopian model would have a TPR of 1 and a FPR of 0 for every classification threshold leading to plotting only one point in the top left corner.

The ROC curves are built on the test data from the Cross Validation, so that it reflects the Out of Sample performance of the model.

Statistics

In the Model Overview, the "Statistics" tab will give you a comprehensive Understanding of the model's performance:

You will find more detailed explanations on each of those metrics in the section Performance metrics.

For each metric, Akur8 gives the results of the computations performed on different samples:

Train Full: the model is trained on 100% of the Training Set, and its performance is computed on 100% of the Training Set (this is an in-sample metric!)
Train K-fold each K-model is evaluated on the part of the data that was used to train it (this is an in-sample metric!).
Test K-fold: each K-model is evaluated on the Test part of the Cross-Validation: this is an out-of-sample metric, which provides a good estimate of the generalization error.

It is also possible the display the Performance Metric for each fold by clicking on the arrow on the right end of each line:

The variations from one fold to another are used to build the Error Bars around each dot displayed on the different Grid Searches:

Residuals

The model's residuals can be analyzed on the "Residuals" tab of the Model Overview. You can choose between Deviance Residuals and Quantile Residuals (see the section "Residuals" in Performance metrics for more details):

Variable importance

The Variable Importance graph shows which variables are included in the model, as well as how important they are with respect to their impact on the predictions.
There are many ways in Machine Learning to assess that, but as our models are coefficients-based we can derive the Variable Importance directly from the coefficient spreads.

100/0 spread

The 100/0 spread is computed by looking directly at the maximum and minimum coefficients for a given variable, and then by computing their difference.

For multiplicative models and logistic, starting from the percentage coefficients displayed in the tool the spread will be calculated as:

For the above example, the 100/0 spread is 123%, calculated as follows:

For identity-link models (Gaussian regression) the spread will be just the difference of the coefficients.

95/5 spread

The 95/5 spread is computed similarly to the 100/0 ratio, but prior to that calculation, the 5% of the riskiest exposure and the 5% of the less risky exposure are removed.

Understanding variable graphs

The Variable Graph displays different plots to help understand if and how the model captures the signal for each variable.

Observed values

The Observed Values are displayed in purple. They represent the average of the target variable (divided by exposure if toggled and weighted by exposure if defined) on the modeling part of the dataset for each level:

Predicted values

The Predicted Values are displayed in orange. They represent the average prediction of the model (weighted by exposure if defined) on the modeling part of the dataset for each level:

Model coefficients

The model's coefficients are displayed in green (and shown in percentage).

They can be either multiplicative (for models with a log link) or additive (for models with a logit or identity link).

In either case, the coefficients are translated to be displayed in a more understandable way.

For multiplicative models, for which the final output would be:

Formula to compute the prediction for a multiplicative model.

Then the coefficient will be displayed as:

Formula for the display of coefficients in Akur8.

𝐶_𝑟𝑒𝑓 is the value of the coefficients for a given reference level, which can be at the mode, at a user-specified level or at a "virtual" level defined to be attached to a mean. For more details, please see Coefficient rescaling below.

Normalized versus raw

The observed and predicted values, can be shown in different ways: Normalized or Raw.

Switching from one to another can be done by clicking on the following button, just under the Variable Graph:

The Raw values correspond to the weighted average value for each level of the variable. For instance, the raw frequency or average cost, as observed or predicted. This option is applied only to observation and predictions, not coefficients.

The Normalized values correspond to a re-scaling of the raw values such that the observed average value for the whole modeling database is set to 0%. For a multiplicative model the rescaling is done using the following formula

Coefficient rescaling

For a given variable, coefficients can be shifted to have another level as the reference level (coefficient at 0% for a multiplicative model). Model predictions are invariant under some coefficient rescalings. We provide three different choices to exploit this invariance:

Initial scaling
This is the initial scaling of the coefficients returned by default, which ensures that for all variables the main effect is centered around 0%.
Scaled by mode
In this case, Akur8 simply takes the value at the mode (the level with the highest exposure) and uses it as a reference. The coefficient value for the mode level will then be at 0%.
Scaled by level
This option allows the user to choose the level they want to have as a reference. The scaling operation will be the same as described above for the mode, except that the level is for the user to choose.

Those different ways to rescale can be accessed by clicking on the following icon above the top right corner of the Variable Graph:

Which will then give all the options:

Grouping for ordinal variables

According to your preferences for ordinal variables you can group levels based on coefficients, quantile and weighted quantiles by clicking on the drop-down menu in the top-right corner.

Display bins based on coefficients

For ordinal variables, the variable graphs can be displayed by bins based on coefficients. Levels with equal coefficients will be grouped in one bin and all correponding metrics will be re-computed. This is a visualization option to identify rapidly the groupings that have been assessed for a model.

This binning option is persisted for each variable when switching between models in the same project. The option is also persisted in the documentation graph. If the binning option is selected for a variable, the graph exported will be in the binned version in the file.

Quantile and Weighted Quantile Grouping

To better recognize patterns in noisy variables, by clicking this icon

in the top right corner you can group results by quantile or weighted quantile:

Quantile grouping creates bins that contain roughly the same number of levels. As a consequence, the distribution should keep its overall shape.
Weighted quantile grouping uses a simple heuristic to create bins that contain roughly the same amount of exposure. Please note that in certain cases, especially if the exposure distribution is not uniform, the actual number of bins can be lower than the requested number.

Note: When grouping by quantile or weighted quantile, coefficients are not shown because they can't be aggregated to the grouping level.

Labels ensure full traceability, and these grouping options are persistent, just like coefficient grouping.

The number of bins can be changed by using the + and - buttons, or by clicking on the displayed number and directly inputting a value:

This feature is available also in the Comparison mode functionality.

Grouping for categorical variables

For categorical variables, there is only one option: smallest exposure grouping. This groups together the levels with the lowest amount of exposure.

The number of groups can be changed by using the + and - buttons, or by clicking on the displayed number and directly inputting a value. The group will be on the left of the graph.

Sorting for categorical variables

The modalities of categorical variables can be sorted by exposure, observed or predicted values after clicking this icon:

Time consistency

Checking for time consistency of a variable is done via the interaction between the variable and the Date variable (specified in the Goals as per Date).

To visualize this interaction, you may select on the "Time Consistency" option in the "Graph Type" field under the Variable Graph:

The interactions are then displayed in the following way:

If this interaction is significant (i.e. the coefficients are sensibly different), the model is not consistent throughout the years.

The interaction is fitted up to a significance value which is determined by the smoothness of the considered model.

It is also possible to visualize, for each Date level, the Observed and Predicted values by selecting the "Observed/Predicted" option in the "Option" field under the variable graph:

Risk: Time consistency, Observed / Predicted

Full Tutorial

Using blocks

Geography test

Understanding a model