Hybrid Augmentation: Building a RAG-CAG Framework to Support Linguistic Research

Submission 22

D1_TPoster-11

Presented by: Rui SHANG

Rui SHANG ¹, Ke Wang ²

¹ Westlake University

² Zhejiang University

As large language models (LLMs) rapidly develope in supporting research, many institutions have already adopted commercial GenAI tools, such as ZJU's Deepseek and HKU's ChatGPT service etc. While these tools demonstrate certain capabilities in general tasks, they always fail to provide convincible answers to prompts related to a specific academic field (Singh et al., 2025). This defect may come from various reasons, such as everchanging knowledge, illusion, input truncation and poor reasoning from current LLMs (Kasneci et al., 2023). Consequently, the application of genral LLMs in academic fields remains a significant challenge.

To address the limitations of off-the-shelf models, augmented generation have been invented and widely implemented, since they enable LLMs to produce more accurate and up-to-date responses by retrieving relevant resources from databases or retraining models with specialized materials. Common approaches include the latest retrieval-augmented generation (RAG) and cache-augmented generation (CAG). RAG dynamically loads context from storage sources (such as databases or object storage) while providing response after checking relevant content through queries, embeddings, and retrieval algorithms (Guo., 2025; Oche et al., 2025). Therefore, the performance of RAG applications depends not only on the quality of the context but also on the embedding and searching algorithms selected during system design. In contrast, CAG preloads the entire context into memory and uses key-value (KV) caches within the model's context window (Agrawal, 2025), which is also known as prompt-caching recently provided by ChatGPT and Claude. However, CAG is constrained by the model's token limits (typically ranging from 32k to 100k tokens). Although this capacity is substantial, it remains finite, restricting its scalability to extremely large datasets. To combine the advantages of both, hybrid methods that integrate RAG and CAG have emerged. For instance, initial simple queries can be immediately answered using CAG’s preloaded cache, while subsequent in-depth queries leverage RAG’s dynamic retrieval. This integration effectively manages memory constraints, optimizes the use of the limited context window, and prioritizes low latency in real-time applications.

While both RAG and CAG are highly dependent on high-quality, domain-specific knowledge, the practical application of these technologies has predominantly been limited to corporations and professional services. These implementations have typically involved publicly available data source or generic knowledge bases in certain professional industries, such as enterprises, discovery searching and medicine (Xu et al., 2024; Bevara et al., 2025; Gan., 2025). However, there is a noticeable absence that combine these techniques with proprietary academic. Existing frameworks like RAGAS, DeepEval, and TruLens assess LLM performance, but evaluations of academic sources are scarce, despite their reliance on verifiable ground truth for accuracy and reliability.

This paper aims to extend the integration of RAG and CAG pipelines with existing LLM, enabling them to offer more domain-specific and up-to-date responses within academic fields, and providing rational evaluation criterion. Meanwhile, it seeks to demonstrate the new possibilities of LLMs in academic services. Specifically, academic libraries or other similar technical agencies, as a hub of academic resource, have the potential to develop tailor-made, shared collaborative systems for researchers. In this particular study, we take a linguistics research project as an example, where we organize and address some of the controversial academic concepts within the field to serve as material for a hybrid RAG-CAG model built upon Deepseek LLM model provided by university library. The model is optimized to assist in cross-comparing and analyzing various scholars' definitions across a context of board literature, thus mitigating the common defects appeared in traditional large models, such as truncating relevant literature or making speculative judgments. Technically, the key literature corpus is preloaded into the model to integrate domain knowledge as CAG, while RAG dynamically loads the academic outcomes to facilitate knowledge updating without retraining. Finally, the hybrid RAG-CAG model demonstrates improved performance in handling domain-specific linguistic queries by combining CAG’s efficient, preloaded context access with RAG’s dynamic retrieval capabilities. In addition, this article also integrates various evaluation frameworks commonly used in general tasks, such as RAGAS, DeepEval and TruLens, and proposes the Research Evaluation For Augumentation Framework (REAF) to emphasize the ground truth and contextual relevance of academic sources to address the scarcity of specialized evaluations.

Keywords

Large Language Models (LLMs)

Retrieval-Augmented Generation (RAG)

Cache-Augmented Generation (CAG)

Academic Libraries

Domain-Specific Knowledge

Linguistics Research

Hybrid AI Models