Submission 89
GraphRAG for Plant Knowledge Extraction from Chinese Local Gazetteers
D1_TPoster-08
Presented by: Hui Li
Chinese Local gazetteers (a.k.a., difanghzi, Chinese 方志), are historical records that have been continually written or compiled by local officials and scholars, concerning a variety of topics such as local products, landscape, population, economy, and culture over time [Li & Li, 2022]. Jiangsu Province in the South of China has a history of over 2,000 years of compiling local gazetteers, and the gazetteer collections of regional products owned by Nanjing Agricultural University in Jiangsu is one of the largest gazetteer repositories in China.
Nowadays, with the support of cutting-edge digitization technology, researchers are increasingly focusing on discovering valuable knowledge from large-scale local gazetteers via text mining strategies. Most of the previous studies dedicated to the extraction and organization of entity information in historical texts, particularly concerning mentions and descriptions of persons, plants, and locations. For example, Chen et al. [2020] developed LoGaRT, a suite of digital tools for the retrieval and analysis of Chinese local gazetteers; Liu et al. [2022] applied natural language processing techniques to extract names of individuals and locations, along with their relations. However, relatively few studies have attempted to further interpret and leverage the implicit knowledge about local products (e.g., plants and animals) within these gazetteers (Li, 2021).
Rice is a staple food in southern China, with paddy rice serving as the one of the primary crops in Jiangsu Province. The local gazetteers encompass a variety of rice cultivars across most rice-growing regions including Jiangsu Province in China [Xia et al., 2024]. Each record in this digitized collection mainly consists of a rice name, the local gazetteer where the rice is recorded, the time when the gazetteer was published, and the corresponding rice description. By extracting rice information in gazetteers can provide researchers with the unique attributes of special rice varieties and the cultivation history of our ancestors.
In this study, we develop a knowledge-based chatbot system utilizing Generative AI technologies based on rice records in 544 local gazetteers from Jiangsu province spanning over 8 centuries. This system aims to introduce rice knowledge to the public and serve as a valuable reference for specialized rice breeding. This system consists of three key components: “Rice History”, “Rice Chatbot” and “Land of Rice and Fish". Each component is briefly introduced as follows.
Rice History:in this component, textual descriptions from local gazetteers, folklore records, and biographies are selected and displayed on the webpage. By reading these materials or listening to the corresponding AI-generated audios (click a mascot-figure of Nanjing Agricultural University) , users can gain a vivid understanding of Jiangsu rice history and culture.
Rice Chatbot: this component consists of two parts, i.e., the knowledge graph of rice cultivars and the retrieval augmented generation (RAG) boosted by the power of large language models to enhance users with an interactive and pleasant human-computer interaction. In graph module, nodes represent rice names and attributes, and edges represent relations between entities. When users raise a rice-relevant question, the chatbot quickly identifies the nodes and paths most closely associated with the query in graph, and then returns the targeted outputs to users. While in RAG module, the textual descriptions and the internal knowledge of LLMs enable the system to integrate general knowledge into the rice domain knowledge during the dialogue process. Thus this platform not only addresses specialized questions on rice cultivation, but also provides broader insights by incorporating the knowledge of other related areas.
Land of Rice and Fish: this component is composed of three elements, the “Rice Map”, “Rice Chart”, and “Rice Game”. While appreciating the beauty of data, users can gain an in-depth understanding of the rice-related knowledge. Rice Map is visualized using a fish-shape hex map, which simulates the geographical outline of Jiangsu province. Users can click on different hexagon grids to view specific rice varieties and read corresponding descriptions from local gazetteers, respectively. Rice Chart displays the key statistics of rice varieties in Jiangsu via visualized charts on animated painting scroll. Rice Game features an interactive game and players are guided through matching challenges which incorporates rice knowledge, such as key traits, cultivation environment and phenotypes. With a designed interface and simple rules, this game offers a delightful user experience, aiming to promote public awareness and interest in rice science, particularly among young users.
This system serves as an effective tool for the retrieval and analysis of plant knowledge within collections of Chinese local gazetteers. Users who are interested in traditional Chinese culture and/or domain experts in need of resources of local rice varieties, can find the information they need on this system.