T2-RAGBench Submission Guidelines

We welcome submissions to the T2-RAGBench leaderboard! You can submit results for:

Retrieval only: Provide ranked context results
Generation only: Predict answers from provided context
Retrieval-Augmented Generation (RAG): Combine both steps

Submit for any subset and we will calculate the metrics for you (Mean Reciprocal Rank (MRR), Number Match (NM), or both).

📧 Submission Email

Send to: t2ragbench@gmail.com

📌 Email Subject Format

Use this format:

[Date]_[Generator]_[Retriever]_[RetrievalMethod]_[Subset]

Date: Format YYYY-MM-DD
Generator: LLM used (e.g., LLaMA3-70B)
Retriever: e.g., E5-Large
RetrievalMethod: e.g., HybridBM25, SumContext
Subset: e.g., FinQA or all

Examples:

2025-06-05_LLaMA3_E5-Large_HybridBM25_all
2025-06-05_LLaMA3_E5-Large_HybridBM25_FinQA

📎 Attachments

ZIP file with .json result files
Optional: paper or code links

📄 Submission File Format

Your submission should be a .json file with one object per line:

id: Question ID (matches benchmark)
subset: One of FinQA, ConvFinQA, VQAonBD, TAT-DQA
context_id: Retrieved or oracle context ID
prediction: Final numeric answer

Example:

{
  "id": "finqa_train_2",
  "subset": "FinQA",
  "context_id": "finqa_train_ctx_1877",
  "prediction": "41932.20"
}
{
  "id": "convfinqa_0",
  "subset": "ConvFinQA",
  "context_id": "convfinqa_ctx_99",
  "prediction": "206588.0"
}
{
  "id": "va_qa_6",
  "subset": "VQAonBD",
  "context_id": "va_ctx_1493",
  "prediction": "-12.0"
}
{
  "id": "tatqa_train_0",
  "subset": "TAT-DQA",
  "context_id": "d58da8b044c0221e4ad5fb3c60a50486",
  "prediction": "216.0"
}

Ensure the file contains all required fields and covers the relevant subset(s).

📅 Review Process

Review within 10 days
Accepted entries added to leaderboard

🔒 Privacy

All submissions remain confidential. Data will not be shared or published without consent.

📬 Questions? Email us at t2ragbench@gmail.com