Calvin Kamardi

Our thesis is titled Virtual Assistant System in Banking Application With Fine-Tuned LLM and RAG Approach.

Instead of building a single AI that tries to do everything, we implemented a Multi-Agent Orchestrated system. Basically, multiple AI agents collaborate together, each with their own job.

Here’s the simple breakdown of how the system works.

First, we have the Supervisor Agent. You can think of it like the manager. Every time a user sends a message, the Supervisor reads it and decides which agent should handle the request.

Then there’s the RAG Agent, which acts as the information specialist. If the user asks something like “What are the fees for opening a new account?”, the Supervisor forwards that question to this agent.

And finally, we have the Function Call Agent. This one handles real actions. If a user says something like “Transfer 50 dollars to mom”, the Supervisor routes it here. The agent then translates the user’s request into secure backend commands that the banking system can execute.

Preventing AI from Making Things Up

One big problem with standard AI models is something called hallucination.

Sometimes the AI sounds confident, but the information it gives is completely made up. In most situations that’s annoying, but in banking, that’s a serious problem.

To reduce this risk, we implemented RAG (Retrieval-Augmented Generation).

A simple way to think about RAG is like giving the AI an open-book test. Instead of relying only on what it “remembers” from training, the AI first searches a verified knowledge base before answering.

In our case, the RAG agent retrieves information from a curated database of banking rules and policies based on data from a digital bank in Indonesia.

During testing, we found that the system worked best using hybrid search. Instead of only matching exact keywords or only searching based on semantic meaning, hybrid search combines both approaches. This helps the system find more accurate and relevant answers.

With a carefully cleaned and structured dataset, our RAG system achieved a score of 0.572, outperforming several standard benchmark models.

Giving the AI “Hands” to Do the Work

Answering questions is useful, but a banking assistant should also be able to perform actions, like checking balances or transferring money.

This is where Function Calling comes in.

We fine-tuned a Large Language Model so it could understand banking commands written in Bahasa Indonesia and convert them into structured function calls.

To train the model, we created a custom dataset of more than 20,000 simulated conversations. This allowed the AI to learn how different user requests map to specific banking features.

The Results

The fine-tuning process significantly improved the model’s performance.

When tested on single-request tasks, the fine-tuned Indonesian model achieved an accuracy of up to 83%.

For comparison, the base model only scored around 35%, meaning the fine-tuned version performed dramatically better. It also outperformed several other open-source models in the same evaluation.

Current Limitation

However, the system still has one limitation.

The AI struggles a bit with multi-turn conversations — situations where the interaction requires multiple back-and-forth messages.

For example, if the user gives incomplete information and the AI needs to ask follow-up questions, the system sometimes gets confused. This happens because the training dataset focused mostly on single, direct commands, rather than long conversational flows.

The Bottom Line

By combining smart request routing, retrieval-based fact checking, and a fine-tuned action model, this system shows a practical blueprint for the future of digital banking assistants.

The goal is simple: reduce the complexity of banking apps and make financial services easier to access — even for users who aren’t very tech-savvy.

Paper reference: https://ieeexplore.ieee.org/abstract/document/11327124

Banking Virtual Assistant with Fine-Tuned LLM and RAG

Preventing AI from Making Things Up

Giving the AI “Hands” to Do the Work

The Results

Current Limitation

The Bottom Line