Home / Blog / Data Science / Market Segmentation for Life Insurance

Market Segmentation for Life Insurance

June 27, 2023
45

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Market Segmentation for Life Insurance Analysis using Customer Credit Card Data (K-means Clustering)

The chosen business issue is associated with market segmentation, which is the division of prospective customers into groups or segments based on their shared needs and likelihood to respond favourably to a marketing initiative, such as a promotion or an offer. Market segmentation enables insurance businesses to target various consumer groups that would distinguish between different goods and services. To personalise and specialise their marketing campaigns, several firms might leverage this specific use case. This study would provide extremely useful insights into their behaviour and forecast how similar-looking future consumers would behave. As was previously said, this research may be used for a variety of purposes. For example, we may use it to help target client groups with suggestions for savings plans, loans, wealth management strategies, insurance plans, and other products. The credit card information of a consumer was obtained from an open-sourced database. There are 18 behavioural factors in the transactional data that relates to various consumers' credit cards. Different attributes were examined for their data type and significance in the dataset as part of the data comprehension process.

Source: voxco.com

We have an attribute customer id, related to the identification of credit cardholders. Balance and Balance frequency are features that provide information on the balance amount left in the credit card holder's account to make further purchases and the frequency of this balance data updated, respectively. Certain variables are related to purchases made by credit cardholders, such as purchases, one-off purchases (maximum purchase amount spent at a single transaction), installment purchases, purchases frequency, one-off purchases frequency, and several purchase transactions. Transactions that were made in paying cash in the advance mode were also notified in data as cash advance, cash advance frequency, and cash advance transactions. The credit limit of the cardholder, payments made in full amount, and minimum payments made were also considered for analysis. Finally, we have tenure of credit card service for the credit cardholder's account that was also included in the data. So, based on these features, we are trying to divide customers and groups will be created based on their similarities. Here, we are going to use the k-means clustering for customer segmentation using credit card details and try to apply it to the insurance industry and the same analysis could be applied to a different range of sectors.

Learn the core concepts of Data Science Course video on Youtube:

Data source: Market Segmentation in Insurance Unsupervised

Fig 1: Sample dataset of the use case

Fig 2: Sample dataset of the use case

If the market segmentation issue statement is used, then we must identify our target audiences, gather data, and organise the data into categories. In the k-means clustering technique, we seek out similarities, choose the number of clusters, and then name these groupings before assigning new clients to them in accordance with the similarities. Let's talk in depth about the algorithm, the data that was used, and the use case of market segmentation for life insurance analysis utilising credit card data from the client.

K-means clustering works in several steps, such as first, the algorithm allocates a random variable as a centroid (Centre of the cluster) depending on the number of clusters we decide on and then uses distance measure, which is Euclidean distance to assign data point to the nearest centroid. Thus, based on the nearest distances, between data points and centroids, the algorithm classifies whole data into different clusters. If we have decided on two clusters (k=2), the process starts with randomly choosing two centroids, and based on the distance measure and closer data points, the algorithm divides data into two clusters.

Source: Wikipedia

However, there are still a few more steps to complete, where each time the distance is measured and iterated, the centroid value changes. This process continues until the centroid values start to converge, and when they do, we halt the procedure and choose our final cluster values.

Market segmentation is four types: Demographic, Psychographic, Geographic, and Behavioural. When it comes to current data, we are going to look into behavioral analysis, where we have details related to purchases and repayments patterns. Based on different economic researchers, economic structure plays a vital role in determining consumer spending approaches in different countries, and this phenomenon is the same, when a customer has to buy something, whether it is an insurance company or any other industry.

Many financial institutions, credit card issuers, and merchants monitor changes in credit card user spending trends to better understand consumer purchasing patterns. To describe the demographic status of a customer, we use these statistics in combination with information that is readily available to the public. Therefore, in this case, our current data analysis uses k-means clustering analysis to identify the new client based on their similarities to existing customers who have already been segmented. We might estimate the insurance payment information on an annual or monthly basis after this categorization is finished using a k-means algorithm, and quotes could be issued to consumers in accordance.

Data pre-processing and exploratory data analysis plays an important role in creating the right model. So, we started to analyze data for any missing values in different columns. There were missing values in the credit limit and minimum payments columns and these missing values were retained by filling these values with mean values. The data has been checked for duplicate values, but no duplicate values were found. The only nominal data present in the dataset is customer id (CUST_ID), which has been dropped from the data, as it was irrelevant to the analysis.

Fig: 3 Missing values or null values in credit card data.

Extreme values have been eliminated as a result of outlier treatment when outliers in the data were identified. Heatmap visualisation revealed the relationship between several elements. Standardisation has been done since the data has scaling concerns and various numerical data has been found to be in different scaled values.

Fig 4: The existence of outlier details in credit card data

Fig 5: Heatmap displaying the correlation details between different variables in data

As part of pre-processing the dimension reduction technique, principal component analysis has been applied to the data, and the whole data has been brought to two columns pca1 and pca2.

Here we have used the k-means algorithm, which is an iterative algorithm that splits the dataset into “K” pre-defined different non-overlapping subcategories (clusters) where each data point fits only one group. While using clustering algorithms, the selection of several clusters is considered a challenging aspect of the whole process. There are some approaches in this process, which will be used separately to select the number of clusters. We have inertia and elbow curve and silhouette analysis methods for deciding the number of clusters. Inertia is nothing but the sum of the squared distance of all values in the dataset to their closest cluster center and we look to have this number as small as possible. In the current data analysis, we have taken the inertia and elbow curve method for selecting the number of clusters.

Fig 6: The inertia and elbow curve plotted towards credit card data

Principal component analysis was used to pre-process the data as part of the dimension reduction approach, and the entire data set was split into two columns (pca1 and pca2) as a result. On scaled data, we first used k-means clustering, and we chose 4 groupings based on the elbow curve. For better visualisation, we labelled the identical cluster values to the PCA data as well as the cluster features that were labelled to the data.

Fig 7: Showing code with k-mean algorithm fitting and visualization

Fig 7: The scatter plot showing 4 clusters segmented based on similarities.

Finally based on these results the total data has been grouped based clusters for further labeling process. There are different clustering methods such as Agglomerative Clustering, Spectral Clustering, Gaussian Mixture Model-based clustering, and DBSCAN Clustering could be also performed and the clustering data could be compared and evaluated. In the case of selecting the number of clusters, we can also use the Silhouette Coefficient or silhouette score method.

Fig 8: Sample data with clusters and grouped data.

According to the findings, we can see that cluster 1 has a lower balance and fewer purchases. The balance was high and the purchase rate was similarly high when the data from cluster 3 was examined. When compared to cluster 3, Cluster 2 simply had a lower balance and fewer purchases. When compared to other groups, the balance and purchases for cluster 0 were average. With more variables, the same pattern persisted as well. We may categorise clients depending on their purchasing power while doing behavioural segmentation analysis, and using this information, we can create insurance plans and offers. If this behaviour is taken into consideration, the cluster 3 might be identified using the aforesaid segmentation. If this phenomenon is taken into consideration and the aforementioned segmentation is called, cluster 3 might be referred to as consumers who make the greatest purchase, cluster 2 as customers who make moderate purchases, cluster 0 as customers who make medium purchases, and cluster 1 as customers who make the lowest purchases.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore