Detect hallucinations and regressions in the quality of your LLMs
One of the key features of Traceloop is the ability to monitor the quality of your LLM outputs in real time. It helps you to detect hallucinations and regressions in the quality of your models and prompts.To start monitoring your LLM outputs, make sure you installed OpenLLMetry and configured it to send data to Traceloop. If you haven’t done that yet, you can follow the instructions in the Getting Started guide.
Next, if you’re not using a supported LLM framework, make sure to annotate workflows and tasks.
A monitor is an evaluator that runs on a group of defined spans with specific characteristics in real time. For every span that matches the group filter, it will run the evaluator and log the monitor result. This allows you to continuously assess the quality and performance of your LLM outputs as they are generated in production.Monitors can use two types of evaluators:
LLM-as-a-Judge: uses a large language model to evaluate outputs based on semantic qualities. You can create custom evaluators with this method by writing prompts that capture your own criteria.
Traceloop built in evaluators: deterministic evaluations for structural validation, safety checks, and syntactic analysis.
All monitors connect to our comprehensive Evaluators library, allowing you to choose the right evaluation approach for your specific use case.