AI Tools Directory
1,173 tools
Arthur Shield
A paid product for detecting toxicity, hallucination, prompt injection, etc.
Puzzlet AI
The Git-Based LLM Engineering Platform. Achieve more from GenAI: Manage, evaluate, and improve your full-stack LLM appli
PromptLayer π°
Prompt Engineering platform. Collaborate, test, evaluate, and monitor your LLM applications
PromptHub
Full stack prompt management tool designed to be usable by technical and non-technical team members. Test, version, coll
Parea AI
Platform and SDK for AI Engineers providing tools for LLM evaluation, observability, and a version-controlled enhanced p
Manag.ai
Your all-in-one prompt management and observability platform. Craft, track, and perfect your LLM prompts with ease.
Izlo
Prompt management tools for teams. Store, improve, test, and deploy your prompts in one unified workspace.
Epsilla
An all-in-one platform to create vertical AI agents powered by your private data and knowledge.
Dataoorts
Enjoy unlimited API calls with Serverless AI Workers/LLMs for just $25 per month. No rate or concurrency limits.
ClevAgent
Runtime monitoring for AI agents β heartbeat watchdog, loop detection, cost tracking, auto-restart. Python SDK or HTTP A
Open Responses
Serverless open-source platform for building long-running LLM agents with tool use.
Cohere Summarize Beta
Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
Emergent Mind
The latest AI news, curated & explained by GPT-4.
The Chinese Book for Large Language Models
An Introductory LLM Textbook Based on [*A Survey of Large Language Models*](https://arxiv.org/abs/2303.18223).
Guardrails.ai
A Python library for validating outputs and retrying failures. Still in alpha, so expect sharp edges and bugs.
Tune Studio
Playground for devs to finetune & deploy LLMs
We-Math
a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
SuperLim
a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such
SuperBench
a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on
SciBench
benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from d
MMToM-QA
a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs
MMedBench
a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
MathEval
a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nea
LLMEval
focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability pe