AI Tools Directory
6,396 tools
Harmonai's Dance Diffusion
Open-Source AI Audio Generation Tool For Music Producers – Weights & Biases
Taskbase
Virtual assistants packaged with AI powered software.
FramePack
next-frame prediction neural network structure that generates videos progressively
Emergent Mind
The latest AI news, curated & explained by GPT-4.
The Chinese Book for Large Language Models
An Introductory LLM Textbook Based on [*A Survey of Large Language Models*](https://arxiv.org/abs/2303.18223).
BUILD GPT: HOW AI WORKS
explains how to code a Generative Pre-trained Transformer, or GPT, from scratch.
Alexander Rush Series
high quality and educational materials you don't want to miss.
Arthur Shield
A paid product for detecting toxicity, hallucination, prompt injection, etc.
Weights & Biases
Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
Guardrails.ai
A Python library for validating outputs and retrying failures. Still in alpha, so expect sharp edges and bugs.
Tune Studio
Playground for devs to finetune & deploy LLMs
WHOOPS!
a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations
We-Math
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
VisualWebArena
a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
TAT-DQA
a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly
SuperLim
a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such
SuperBench
a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on
SciBench
benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from d
PubMedQA
a biomedical question-answering benchmark designed for answering research-related questions using PubMed abstracts.
OlympicArena
a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and mo
MMToM-QA
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs
MMedBench
a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
MathEval
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nea
LLMEval
focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability pe