Thorough Analysis of Loan Credit Risk
Analysis of loan Credit Risk
Tobou Egbekun
9/1/20253 min read


Tools: Power BI, SQL, Excel
Source of Data: Kaggle
Project Goal
This is an original dataset that was prepared by a certain Professor Hofmann, it originally had 20 columns of data but we only needed 9 columns for the purpose of this project which is to find out the demographic and purpose of people taking loans.
Executive Summary
People with Housing Status “Free” applied for the most loans(credit)
33.7%(the most) of loans were taken for the purpose of cars
The age group taking the most loans are between ages 25-40(577 loans)
The age group that has taken the highest average amount of loans(3434) are between ages 45-60
Methodology
I started cleaning the data by checking for null values, then I started my exploratory data analysis:
SELECT
SUM(CASE WHEN saving_accounts IS NULL THEN 1 ELSE 0 END) AS missing_saving,
SUM(CASE WHEN checking_account IS NULL THEN 1 ELSE 0 END) AS missing_checking
FROM german_credit_mock;
Result: The output for both the saving and checking account came back as 0.
Validating Job Categories:
There are 4 kinds of job identifiers between 0-3.
Job Categories
0- unskilled,non-resident
1 - unskilled and resident,
2 - skilled
3 - highly skilled
SELECT DISTINCT Job
FROM german_credit_mock
WHERE Job NOT IN (0,1,2,3);
Result: The output came up with nothing as a confirmation that we have only four categories.
Average credit amount by housing type
SELECT Housing, AVG(credit_amount) AS avg_credit
FROM german_credit_mock
GROUP BY Housing
ORDER BY avg_credit DESC;
Result: Identified people with the housing type “free” as people with the highest form of credit or loans.
Distribution of loan Purposes
SELECT Purpose, COUNT(*) AS total_loans,
ROUND(100.0 COUNT() / (SELECT COUNT(*) FROM german_credit_mock), 2) AS percentage
FROM german_credit_mock
GROUP BY Purpose
ORDER BY total_loans DESC;
Result: Identified that 33.7% of people took loans for the purpose of getting cars
Discovery of age group that take the highest loans
SELECT CASE
WHEN Age < 25 THEN 'Under 25'
WHEN Age BETWEEN 25 AND 40 THEN '25-40'
WHEN Age BETWEEN 41 AND 60 THEN '41-60'
ELSE '60+'
END AS age_group,
AVG(credit_amount) AS avg_credit,
COUNT(*) AS num_loans
FROM german_credit_mock
GROUP BY age_group
ORDER BY avg_credit DESC;
Result: identified Age group between 41-60 with the highest average amount of loans($3434). 25-40 takes the most loans(577 loans).
Credit risk by savings
SELECT saving_accounts,
AVG(credit_amount) AS avg_credit,
AVG(Duration) AS avg_duration
FROM german_credit_mock
GROUP BY saving_accounts
ORDER BY avg_credit DESC;
Result: Identified a Group known as NA and Moderate as the groups of people that took the longest to pay back their loans. People in the quite rich category took the least amount of time.
Note: People with Saving accounts are classified as little, moderate, quite rich and Rich. There was a category unidentified called NA which had the highest number of loans and highest duration to pay.
Cross check job type vs credit eligibility
SELECT Job, AVG(credit_amount) AS avg_credit, COUNT(*) AS total
FROM german_credit_mock
GROUP BY Job
ORDER BY avg_credit DESC;
Result: Identified that highly skilled individuals(148) were lended the highest amount of loans($5,435).
Key Insights and Data Driven Recommendation
Insight: Auto-Loan Opportunity (33.7% of loans are for cars)
RecommendationSince over one-third of loans are for cars, banks should design specialized auto-loan products (e.g., lower interest rates, flexible repayment for younger borrowers).
Target marketing efforts towards ages 25–40, who make up the largest loan-taking segment by volume.
Insight on Premium Segmentation: Highly Skilled Workers (avg. loan = $5,435)
RecommendationHighly skilled professionals receive the highest credit amounts.
This group can be targeted with premium financial products (credit cards, investment-linked loans, bundled insurance) since they represent a low-risk, high-value customer segment.
Insight on Risk Management: “NA” Savings Accounts Segment
RecommendationCustomers with “NA” savings accounts take longer loan durations and represent potential credit risk.
Banks should:
Investigate whether “NA” means unbanked customers or simply missing data.
Introduce stricter eligibility checks or higher collateral requirements for this group.
Insight: Housing Factor in Loan Approval
RecommendationCustomers with “Free” housing have higher credit amounts.
These individuals may have lower living expenses and hence greater repayment capacity.
Adjust credit scoring models to factor in housing status as a key variable.
Age Group Insights
RecommendationAges 25–40 → Highest loan demand by count. Banks should focus on volume-driven loan packages for this group.
Ages 41–60 → Highest average loan size ($3,434). Banks should tailor longer-term financial products (e.g., home improvement loans, business expansion credit) for this group.
Geared Product Bundling
RecommendationOffer cross-sell products: for example, pair auto-loans with car insurance or credit monitoring tools.
For younger borrowers (25–40), bundle loans with financial literacy perks (budgeting apps, loyalty programs) to improve retention.
Summary of Recommendations
Launch targeted auto-loan packages for young professionals.
Develop premium financial solutions for highly skilled workers.
Tighten risk controls for NA savings borrowers.
Factor housing status into loan scoring models.
Segment age groups into volume-driven vs value-driven loan strategies.