Modelling Credit Risk for Personal Loans Using Product-Limit Estimator

A productlimit approach was adopted to estimate time to default for male and female loan applicants. For each group, a sample of 250 applicants was observed for a 30 months. The life of the account is measured from the month it was opened until the account becomes ‘bad’ or it is closed or until the end of observation. The account is considered bad if payment is not made for two consecutive months in line with the industry practice. If the account does not miss two payments and is closed or survives beyond the observation period, it is considered to be censored. The results showed that there is no significant difference between male and female applicants in terms of their survival times and hazard rates.


Introduction
Traditional credit risk models aim at determining a customer's probability of defaulting on loan repayment. This study used survival analysis method which draws its origin from the study of time to death or occurrence of any other event on life data. The technique has also been applied in engineering to model failure time of components and parts. In credit risk modelling, the event of interest is default, thus modelling time to default on loan obligations.
Modelling of credit risk using survival analysis was first introduced by Narain (1992). Thomas et.al (1999) further developed the model. Narain (1992) applied the survival model on 24 months of loan data. The result showed that the survival analysis approach provides more detailed and relevant information for credit management than the conventional approaches. Thomas et.al (1999) applied the technique by using the accelerated life exponential model to 24 months loan data. The results also showed the approach to be superior to conventional credit scoring methods in that a better creditgranting decision could be made if the score was supported by the estimated survival times. The research by Thomas et.al (1999) also did a comparison of exponential, Weibul and Cox non-parametric models with logistic regression and they concluded that survival method was a better modelling tool.
In the literature, a number of techniques have been applied to model credit risk. Orgler (1970) applied regression analysis in a model for commercial loans. Wiginton (1980) was one of the first to publish credit scoring results using logistic regression. It was compared with discriminant analysis. Leonard (1993) also applied logistic regression in evaluating commercial loans. Durand (1941) pioneered the use of discriminant analysis for credit scoring. Decision tree and rule was adopted by Murkowski (1985) and Mehta (1968) for credit scoring.
Other techniques include K-Nearest Neighbour Classifiers which was used by Chatterjee and Barcun (1970) and Henley and Hand (1996). Baesens et. al (2003) studied the use of Bayesian network classifiers to rate borrowers. Linear programming was applied by Hardy and Adrian (1985) and compared it with other statistical approaches.

Descriptive Methods of Time-to-event
Survival analysis is a statistical method for modelling the time to some events for a population of individuals. For example, events may refer to death in medical application, or recidivism of released prisoners in criminology application, or first bought of a new product by customer in marketing studies. The time to the occurrence is termed as survival time or lifetime. In application to credit risk modelling, the events refer to default of a loan and therefore its lifetime refers to time-to-default T.
Default times are subject to random variation and are thus random variables. To describe their randomness, there are five standard ways: These five formulations are mathematically equivalent but they highlight different aspects of the default time. The distribution function tells us the probability that default occurs at or before time t. Conversely, survivor function is the probability that default does not occur at or before time t; in other words, the loan survives (non-default), at least, to time t. The interpretation of hazard function is slightly tricky. It is the "rate" that borrower defaults at time t, conditional on his staying on the books up to that time. Note that hazard is not a probability and thus can be greater than one.
In survival analysis, one must consider a key analytical problem called censoring. In essence, censoring occurs when we have some information about an individual's survival time, but do not know the exact survival time. There are a number of types of censoring, such as random, interval, left, and right censoring. In credit scoring application, most of the cases are right censoring.
For example, suppose we follow a group of borrowers for 3 years. If we observe borrower A fails to repay at 15th month, he is certainly classified as a default case and his default time is 15. On the other hand, consider borrower B, who repays on time during the whole observed period. We do not know his exact default time but are sure that it must be greater than 36. For such case, borrower B is known as a right censored observation. Another example of right censoring could be when borrower C repays on time from the 1st month to the 12th month. At the 12th month, we do not have future repayment pattern of borrower C. As borrower B, we do not know the exact default time of borrower C, we only know that it must be greater than 12. This is also a right censoring example.

Statement of the Problem
Since the year 2003, the Kenyan financial market has experienced growing liquidity, which has caused banks to rigorously market various loan products. This has given rise to the need to review the banks' credit granting criteria to reflect the growing volume of loan portfolio and to respond to the current global credit crunch. However, research on credit risk has surprisingly received insignificant attention from both practitioners and scholars in Kenya and the larger African continent. Over the years, banks have perpetually used traditional credit scoring techniques to rate loan applicants.
A number of studies have been carried out on the issue of credit risk modelling using different approaches. A limited number of studies have applied survival analysis techniques but none has used product-limit method to analyse credit risk. To this end, the research intended to model probability of servicing loans and hazard rates for both male and female borrowers using this method. Furthermore, existing credit scoring models classify borrowers into different risk categories but cannot provide any information on when the borrower is likely to default. It is more informative for the lender not only to know the probability of defaulting but also when the default is likely to happen. This helps to fairly price risks and improve the focus on ultimate profitability. For instance if the lender knows that a group of loan applicants are bad type, instead of rejecting their applications, it may grant loans to them at higher interest rate, as long as the term of the loan is shorter than the likely time to default. Thus some "bad" applicants can also be viewed as profitable propositions.

General Objective
The broad objective of this research was to use product-limit survival model to generate default probabilities at various points in time. The study also intended to perform a test of equality of the two risk groups, namely male and female applicants.

Specific Objectives
The specific objectives included: 1. To estimate time to default using product-limit estimator for each risk group.
2. To determine hazard rate for each risk group on the basis of product-limit estimator.
3. To test the statistical significance of the differences in the survival curves for each risk group based on log-rank tests.

The Product -Limit Estimator
This function estimates survival rates and hazard from data that may be incomplete.
The survival rate is expressed as the survivor function (S): -where t is a time period known as the survival time, time to failure or time to event (such as death); e.g. 5 years in the context of 5 year survival rates. Some texts present S as the estimated probability of surviving to time t for those alive just before t multiplied by the proportion of subjects surviving to t. This is univariate method which generates the characteristic "stair step" survival curves. It also called Kaplan-Meier estimator. The survival curves for the risk groups were compared using log-rank test.
Hypothesis test: H 0 : the curves are statistically different H 1 : the curves are statistically the same.

P T t T t P T t P T t T t P T t T t P T t P T t T t P T t T t P T t t P T t T t P T t P T t T t
The Product-limit estimator of the survivor function at time t for ( ) Standard error of estimation (for large sample) is given by Hazard function estimation for Product-limit estimator

The Research Findings
The model outputs were as follows: For gender factor 0 (female): Number Gender 1 refers to male borrowers and 0 refers to female borrowers. Event 1 denotes loan default and 0 denotes censored state.
The survival data output on gender 1 implies that out of the 250 male loan applicants for loans maturing in 30 months, 11 defaulted and 4 settled their loan accounts before maturity. Mean survival time of 15 means that on average, a male applicant will take 15 months to default. Same interpretation can be attributed to data on female applicants. The following summary was also generated: The survival curves generated also give the same indication that there is no significant difference in the survival curves for male and female borrowers.

Conclusions
The research findings show that there is no significant difference between male and female borrowers in terms of their time to default on loan obligations. This implies that gender does not affect credit risk. Mean survival times would guide credit granting process on the average maturity for loans that may minimize on default losses and optimize profitability.

Recommendations and Suggestions for Further Research
This method of credit risk modelling is quite reliable as it does not make assumptions about loan default distribution unlike parametric methods. However, given that product-limit is a univariate method, it may be more informative to adopt multivariate techniques like Cox model to model credit risk. Thus further research can be conducted on the same data set using other survival techniques.