An AI-powered knowledge base for scientific abstracts: a case study on environmental DNA (eDNA) in biomonitoring

Loading...
Thumbnail Image
Date
2025-12-05
Journal Title
Journal ISSN
Volume Title
Publisher
Plovdiv University Press "Paisii Hilendarski"
Abstract
Environmental DNA (eDNA) refers to genetic material shed by organisms into their environment, such as water, soil, or air. As a non-invasive biomonitoring method, eDNA has revolutionized biodiversity assessment by enabling the detection of species presence without direct observation or capture. This approach is especially critical for tracking invasive, elusive, or endangered species and monitoring ecosystem changes due to climate or anthropogenic pressures. Over the past decade, a growing body of scientific literature has explored eDNA applications, resulting in a fragmented but rich landscape of domain-specific knowledge. Navigating this information is increasingly challenging for researchers and policymakers. To address this, we developed BioTrace, an AI-powered knowledge base designed to support conversational exploration of scientific abstracts focused on eDNA in biodiversity monitoring. BioTrace leverages a Retrieval-Augmented Generation (RAG) architecture, integrating the mistral-saba-24b large language model via the Groq API for ultra-fast, low-latency inference. Scientific abstracts are indexed using a vector store, and retrieved passages are reranked using the all-MiniLM-L6-v2 model to improve answer relevance. Users can query the system in natural language and receive grounded, context-aware responses that synthesize findings across multiple studies. So far, the knowledge base includes more than 4000 abstracts on eDNA studies. This work demonstrates the potential of large language models (LLMs) to distil scientific literature into accessible, structured knowledge. BioTrace empowers users with real-time, interpretable insights into eDNA research, serving as a blueprint for future AI-based tools in ecological and environmental sciences.
Description
Keywords
AI, LLM models, RAG, eDNA, biomonitoring
Citation