AI Tools Directory
1,173 tools
FELM
a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).
CompMix
a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxe
Berkeley Function-Calling Leaderboard
evaluates LLM's ability to call external functions/tools.
AlpacaEval
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
MovieLens-1M
dataset, embodying varied social traits and preferences.
form
. Please keep the alphabetical order and in the correct category.


Frea Buckler ~ Artist
obras usadas para criar essa rede [(19) derrick has started yet another project on Twitter: "Just sent @buntworthy a dem
Confluence
a generative art project by Devi Parikh on BrainDrops.
Computer Vision Art Gallery : CVPR 2021
artworks dealing with computer vision technologies
LAION
Large-scale Artificial Intelligence Open Network
Carolina
General Corpus of Contemporary Brazilian Portuguese with provenance and typology information - Corpus Geral do Português
Taskbase
Virtual assistants packaged with AI powered software.
M3CoT
a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural
OneKE
A bilingual Chinese-English knowledge extraction model with knowledge graphs and natural language processing technologie
AutoGen | Microsoft
multi-agent conversation framework as a high-level abstraction by Microsoft [[github](https://github.com/microsoft/autog
ChatArena
building multi-agent environments for LLMs
Eden AI
provides a unique API connected to the AI engines
LiveBench
A Challenging, Contamination-Free LLM Benchmark.
Evaluating LLMs is a minefield
talk by Princeton professor Arvind Narayanan
LLM Use Case Leaderboard
a leaderboard that features LLM use cases.
LMExamQA
a leaderboard that benchmarks foundation models with Language-Model-as-an-Examiner.
Marvin
AI engineering framework for building natural language interfaces