Traceloop is joining ServiceNow
March 2026
Read more →
CEO @ traceloop. Ex-Google, formerly chief architect at Fiverr. Over 15 years of experience building software. M.Sc in CS from the Hebrew University in Jerusalem.
@
nir_ga
Nov 2025

Nir Gazit
Co-Founder and CEO
The article highlights the need for specialized LLM observability platforms to manage non-deterministic behavior, unpredictable costs, and performance issues in LLM applications. Built on OpenTelemetry (via the OpenLLMetry extension), solutions like Traceloop provide automatic instrumentation, AI-specific metrics (such as token usage, latency, and RAG quality), full trace visibility, and reproducible test cases. This approach enables real-time debugging, granular cost control, and continuous monitoring without vendor lock-in—helping teams engineer reliable AI with transparency and flexibility.
Read more →Nov 2025

Nir Gazit
Co-Founder and CEO
The article explains how LLMs can degrade silently over time due to model and data drift, and argues that teams need an LLM reliability platform—built on observability and automated quality evaluations—to detect issues early, monitor performance, and maintain reliable outputs, especially in complex setups like RAG.
Read more →
Nir Gazit
Co-Founder and CEO

Nir Gazit
Co-Founder and CEO
November 2025

Nir Gazit
Co-Founder and CEO
This article explains how teams can move from reacting to unpredictable LLM bills to proactively controlling costs by tracking token usage at a granular, per-user level. The key is attaching metadata—such as user_id or feature_name—to every LLM request so costs can be attributed to specific users, features, or teams. Since manually tagging across multiple services is unscalable, organizations increasingly use LLM proxies or OpenTelemetry-based observability frameworks to centralize and automate this data collection. With full traces tying user actions to token spend, teams can visualize which users or features drive costs, investigate anomalies, set alerts, and enforce budgets. Platforms like Traceloop provide this out-of-the-box, turning opaque LLM spend into a transparent and controllable part of the engineering and FinOps workflow.
Read more →November 2025

Nir Gazit
Co-Founder and CEO
This article explains why debugging LLMs in production is so challenging due to their non-deterministic behavior and complex pipelines, and outlines how modern teams overcome this with deep observability and reproducible debugging. It emphasizes the need for end-to-end tracing—capturing every prompt, retrieval step, API call, and intermediate output under a unique request ID—to understand where failures originate, especially in architectures like RAG. With full trace context, specialized LLM observability platforms can then “replay” production failures as repeatable test cases, allowing engineers to reliably reproduce issues, iterate on fixes, and validate improvements. Ultimately, robust tracing plus one-click reproduction transforms unpredictable LLM anomalies into systematic, solvable problems.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Understanding how your large language model performs requires more than basic monitoring. This article explains how OpenTelemetry and Traceloop’s OpenLLMetry provide complete visibility into LLM operations by capturing token usage, latency, and cost within unified traces. It outlines how developers can instrument their applications, visualize metrics in real time, and use Traceloop’s OpenTelemetry-native platform to simplify observability and performance optimization.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Debugging RAG pipelines without proper visibility can feel like guesswork. This post explains how end-to-end tracing turns that process into data-driven debugging, revealing exactly what happens at every stage of a RAG workflow, from query to retrieval to generation. It highlights why LLM observability platforms matter and how Traceloop delivers complete, out-of-the-box trace visibility that helps teams pinpoint errors, reduce latency, and optimize RAG performance.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Effective LLM monitoring requires visibility into token usage, latency, and SLOs at the user and feature level. This post explains how to instrument applications with OpenTelemetry and use observability platforms like Traceloop to capture, visualize, and optimize these metrics for cost and performance control.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
LLM performance can degrade quietly over time. This post explains how to set up automated alerts for relevance and other quality metrics by combining continuous LLM evaluation with monitoring tools like Grafana, Datadog, or Traceloop to catch issues before users notice.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Traces and spans reveal how every part of an LLM application behaves, from API calls to database queries. This post explains how these observability concepts help debug failures, monitor performance, and track costs through detailed, end-to-end visibility.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Evaluating a RAG pipeline requires measuring retrieval accuracy, generation quality, and operational efficiency. This post explains key metrics such as Context Precision, Context Recall, Faithfulness, and Answer Relevancy, and how modern observability platforms like Traceloop can automatically track and visualize them.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
Building a reliable RAG application requires more than vector search as it demands a full MLOps stack with an LLM observability and evaluation layer for tracing, prompt management, and continuous quality monitoring across metrics like faithfulness, relevance, and safety.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
LLM performance can quietly degrade over time due to concept and data drift. This post outlines how to build a proactive monitoring strategy using evaluation metrics, pre-deployment gating, and continuous production oversight to catch issues before users do.
Read more →October 2025

Nir Gazit
Co-Founder and CEO
LLM observability extends traditional monitoring by tracking model quality, cost, and performance to reveal how language models truly behave in production.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
Hallucinations in RAG apps undermine trust when answers stray from retrieved context. This post explains how to detect them automatically using Faithfulness checks, an LLM-as-a-Judge, and claim-by-claim verification.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
Manual spreadsheet checks don’t scale for LLM evaluation. This post shows how to build an automated framework with datasets, evaluators, monitoring, and alerts to ensure faithfulness and relevance at scale.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
Debugging multi-step LLM agents is complex without proper visibility. This guide shows how to trace agent execution using observability techniques by capturing prompts, tool calls, retrieval steps, and evaluation metrics to pinpoint failures with precision. Learn what signals to log, how to instrument your pipeline with OpenTelemetry, and how to analyze common failure scenarios. Finally, see how platforms like Traceloop streamline tracing, versioning, and monitoring so you can move from guesswork to clear insights.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
A/B testing prompts isn’t just about swapping lines of text. It’s a structured process. This guide breaks down the five key steps to running reliable A/B tests for LLM prompts in production, from forming a clear hypothesis and defining robust metrics to splitting traffic safely and analyzing results with statistical rigor. Learn how modern platforms like Traceloop simplify prompt management, evaluation, and deployment so you can continuously improve user satisfaction and performance.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
RAG apps can break in production even when the code hasn’t changed. The culprit is rarely a bug. More often, the cause is data drift, concept drift, flawed chunking, embedding limitations, noisy context, or external dependencies. This article explains the hidden causes of silent degradation in RAG systems and how continuous monitoring and evaluation can help keep outputs accurate, relevant, and trustworthy over time.
Read more →September 2025

Nir Gazit
Co-Founder and CEO
Learn how to evaluate whether your LLM outputs are truly satisfying users. This guide breaks down a practical 5-step framework to measure user satisfaction, track implicit signals like retries and abandonment, detect hallucinations, collect explicit feedback, trace root causes, and build a continuous improvement loop. Perfect for teams looking to turn unpredictable LLM behavior into reliable, high-performing applications.
Read more →July 2025

Nir Gazit
Co-Founder and CEO
Traceloop auto-instruments your LangChain RAG pipeline, exports spans via OpenTelemetry, and ships ready-made Grafana dashboards. Turn on the built-in Faithfulness and QA Relevancy monitors in the Traceloop UI, import the dashboards, and set a simple alert (e.g., > 5 % flagged spans in 5 min) to catch and reduce hallucinations in production, no custom evaluator code required.
Read more →March 2024

Nir Gazit
Co-Founder and CEO
Tokenization, the process of breaking text into smaller units called tokens, forms the foundation of how Large Language Models like BERT and GPT understand and generate human language. This article explores different tokenization algorithms, their applications in Natural Language Processing tasks, common challenges, and how they compare to other text evaluation metrics, highlighting tokenization's crucial role in bridging human and machine understanding.
Read more →
April 2024

Nir Gazit
Co-Founder and CEO
Deploying Large Language Models (LLMs) requires careful planning across security, infrastructure, and monitoring to ensure successful production implementation. This article provides a practical checklist covering essential steps from defining objectives to gathering user feedback, helping teams navigate the complexities of LLM deployment while maintaining optimal performance.
Read more →
February 2024

Nir Gazit
Co-Founder and CEO
While Large Language Models (LLMs) are commonly used to evaluate other LLMs' performance, new research shows they produce highly inconsistent scores when rating the same text multiple times, making them unreliable as evaluation tools. The article suggests using established, deterministic metrics like BLEU and ROUGE instead, or implementing statistical methods that account for this scoring variability, to more accurately compare different LLM models and prompts.
Read more →
October 2023

Nir Gazit
Co-Founder and CEO
OpenLLMetry has been released as an open-source extension to OpenTelemetry, providing observability tools for LLM and AI applications without vendor lock-in, similar to how OpenTelemetry revolutionized cloud observability. The tool offers instrumentations for various LLM platforms (like OpenAI and Anthropic), vector databases, and frameworks, allowing developers to track LLM responses, prompts, and performance metrics while maintaining the flexibility to use any observability platform they choose.
Read more →