Query Expension for Better Query Embedding using LLMs
-
Updated
Feb 18, 2025 - Python
Query Expension for Better Query Embedding using LLMs
Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"
SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.
Evaluation of BEIR Datasets using ColBERT retrieval model
A genral RAG Search chatbot, with SoTA RAG techniques such as HyDE, Hybrid retrieval with BM25 + RRF and Cross encoder reranking. Evaluated on the BEIR scifact dataset and compared all the different pipelines i tried along the way
Physics-Inspired Reranking via Token-Level Point Clouds & PDE Fusion | NFCorpus NDCG@10 = 0.3232 (+47.2%) | 26ms CPU | Zero training
The only way to accumulate, share, and stream knowledge seamlessly into your LLM work sessions. Peer-to-peer shared knowledge network — continuous shared learning across machines. Knowledge planes: everlastingly up-to-date, globally accessible, at lightning speed. libp2p, W3C did:key, local-first, MIT. 75.22% NDCG@10 on BEIR SciFact.
RAG evaluation on BEIR SciFact: BM25, dense and hybrid retrieval with LLM answers.
BM25 & SBERT retrieval on the FiQA-2018 financial QA benchmark · Gradio demo
Scripts to convert the LegalBench-RAG dataset into the standard IR format
Browser-side IR benchmark: BM25 vs Semantic vs Hybrid retrieval on SciFact (BEIR). Bauman MSTU NIR 2026.
A RAG system that replaces standard BM25/FAISS retrieval with a fully learned neural retrieval stack - including a fine-tuned bi-encoder, a cross-encoder reranker, ColBERT-style late interaction scoring, and a locally hosted LLM generator. Built entirely with free and open-source tools.
Controlled depth ablation of a BERT bi-encoder across training budgets and seeds on three BEIR tasks (nfcorpus, scifact, fiqa). L3–L12 is flat within seed noise at 20K steps; 80K training degrades every depth on zero-shot transfer (−45% NDCG@10 on fiqa for L12).
Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold
Retrieve the top-𝑘 documents with respect to a given query by maximal inner product over dense and sparse vectors
Comparative study of transformer and non-transformer encoder architectures for dense retrieval — mapping the latency × accuracy Pareto frontier on BEIR. Solo research, PES University CSE 2026.
Add a description, image, and links to the beir topic page so that developers can more easily learn about it.
To associate your repository with the beir topic, visit your repo's landing page and select "manage topics."