AI Tools Directory

1,173 tools

Arthur Shield logo

Arthur Shield

A paid product for detecting toxicity, hallucination, prompt injection, etc.

Paid
Puzzlet AI logo

Puzzlet AI

The Git-Based LLM Engineering Platform. Achieve more from GenAI: Manage, evaluate, and improve your full-stack LLM appli

Freemium
PromptLayer 🍰 logo

PromptLayer 🍰

Prompt Engineering platform. Collaborate, test, evaluate, and monitor your LLM applications

Freemium
PromptHub logo

PromptHub

Full stack prompt management tool designed to be usable by technical and non-technical team members. Test, version, coll

Freemium
Parea AI logo

Parea AI

Platform and SDK for AI Engineers providing tools for LLM evaluation, observability, and a version-controlled enhanced p

Freemium
Manag.ai logo

Manag.ai

Your all-in-one prompt management and observability platform. Craft, track, and perfect your LLM prompts with ease.

Freemium
Izlo logo

Izlo

Prompt management tools for teams. Store, improve, test, and deploy your prompts in one unified workspace.

Freemium
Epsilla logo

Epsilla

An all-in-one platform to create vertical AI agents powered by your private data and knowledge.

Freemium
Dataoorts logo

Dataoorts

Enjoy unlimited API calls with Serverless AI Workers/LLMs for just $25 per month. No rate or concurrency limits.

Freemium
ClevAgent logo

ClevAgent

Runtime monitoring for AI agents β€” heartbeat watchdog, loop detection, cost tracking, auto-restart. Python SDK or HTTP A

Freemium
Open Responses logo

Open Responses

Serverless open-source platform for building long-running LLM agents with tool use.

Free
Cohere Summarize Beta logo

Cohere Summarize Beta

Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization

Freemium
Emergent Mind logo

Emergent Mind

The latest AI news, curated & explained by GPT-4.

Freemium
The Chinese Book for Large Language Models logo

The Chinese Book for Large Language Models

An Introductory LLM Textbook Based on [*A Survey of Large Language Models*](https://arxiv.org/abs/2303.18223).

Freemium
Guardrails.ai logo

Guardrails.ai

A Python library for validating outputs and retrying failures. Still in alpha, so expect sharp edges and bugs.

Freemium
Tune Studio logo

Tune Studio

Playground for devs to finetune & deploy LLMs

Freemium
We-Math logo

We-Math

a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.

Freemium
SuperLim logo

SuperLim

a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such

Freemium
SuperBench logo

SuperBench

a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on

Freemium
SciBench logo

SciBench

benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from d

Freemium
MMToM-QA logo

MMToM-QA

a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs

Freemium
MMedBench logo

MMedBench

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

Freemium
MathEval logo

MathEval

a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nea

Freemium
LLMEval logo

LLMEval

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability pe

Freemium