Observability
Viaduct provides comprehensive observability features to help you monitor, debug, and optimize your GraphQL API. This includes built-in metrics collection, error tracking, and integration with popular monitoring systems through Micrometer.
Overview
Viaduct automatically tracks key metrics across your GraphQL operations without requiring any code changes in your resolvers. The system focuses on three main areas:
- Latency monitoring - Track execution times across operations, fields, and resolvers
- Error tracking - Monitor success/failure rates and attribute errors to specific components
- Attribution - Understand which components contribute to latency and errors
Documentation Structure
- Setup Guide - How to enable and configure metrics in your Viaduct instance
- Error Handling - Custom error reporting and handling
Quick Start
Enable observability in your Viaduct instance by providing a MeterRegistry:
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.simple.SimpleMeterRegistry
import viaduct.service.ViaductBuilder
val meterRegistry: MeterRegistry = SimpleMeterRegistry()
val viaduct = ViaductBuilder()
.withMeterRegistry(meterRegistry)
.withTenantAPIBootstrapperBuilder(myBootstrapper)
.build()
Once configured, Viaduct automatically emits metrics for all GraphQL operations. See the Setup Guide for detailed configuration options.
Available Metrics
Viaduct emits three primary metric types, all implemented as Micrometer Timers:
1. viaduct.execution
Full execution lifecycle metric measuring end-to-end execution time for the entire GraphQL request, from parsing through response serialization.
Measurements:
- Duration (timer) with percentiles: p50, p75, p90, p95
- Count (number of executions)
Tags:
operation_name- GraphQL operation name from the query (e.g.,GetUser,SearchProducts)- Only present if the operation is named in the query
success- Execution success indicatortrue- No exceptions thrown AND data is present in the responsefalse- Exception occurred OR no data in response (even if partial data exists)
Use cases:
- Monitor overall API health and performance
- Track SLAs at the operation level
- Identify operations with high error rates
2. viaduct.operation
Operation-level metric measuring the time to execute the specific GraphQL operation after parsing and validation.
Measurements:
- Duration (timer) with percentiles: p50, p75, p90, p95
- Count (number of executions)
Tags:
operation_name- GraphQL operation definition name from the query document- Only present if the operation definition includes a name
success- Execution success indicator"true"- No exceptions thrown AND data is present in the response"false"- Exception occurred OR no data in response
Use cases:
- Measure execution performance excluding parsing/validation overhead
- Compare performance across different operations
- Identify slow operations for optimization
3. viaduct.field
Field-level metric measuring the time to fetch/resolve individual GraphQL fields. This is the most granular metric and helps identify specific bottlenecks.
Measurements:
- Duration (timer) with percentiles: p50, p75, p90, p95
- Count (number of field resolutions)
Tags:
operation_name- GraphQL operation name (if available)field- Fully qualified field path- Format:
ParentType.fieldName(e.g.,User.email,Query.searchProducts) - For root fields: just
fieldNameif parent type unavailable
- Format:
success- Field resolution success indicatortrue- No exception thrown during field fetchfalse- Exception thrown during field resolution
Use cases:
- Identify slow fields and resolvers
- Monitor error rates for specific fields
- Understand which fields contribute most to overall latency
- Attribute performance issues to specific tenant modules
Metric Collection Details
- Automatic collection - Metrics are collected automatically, requiring no changes to your resolver code
- Percentile distribution - All metrics include p50, p75, p90, and p95 percentiles
- Low overhead - Minimal performance impact on your GraphQL operations
- Accurate attribution - Metrics correctly identify which operations and fields are responsible for latency and errors
Use Cases
Latency Analysis
- Determine latency across various percentiles for operations and fields
- Identify critical paths in request execution that contribute to overall latency
- Understand why specific fields or resolvers are slow
- Attribute slowness to specific tenant modules or components
Error Monitoring
- Monitor partial and full failure rates for operations
- Identify which fields or resolvers are causing errors
- Track error rates over time to detect regressions
- Attribute errors to responsible components for faster debugging
Understanding Field Dependencies
Field-level metrics help you understand relationships between fields:
- Field dependencies - See which fields trigger the resolution of other fields
- Resolver impact - Track how resolver performance affects overall request latency
- Execution frequency - Monitor how often specific fields are resolved
- Critical path analysis - Identify which fields contribute most to request latency
For example, as a tenant developer you can:
- Understand why your field is slow by examining dependent field metrics
- See which operations most frequently trigger your field resolution
- Monitor error rates for fields your resolvers depend on
Integration with Monitoring Systems
Viaduct uses Micrometer as its metrics facade, enabling integration with many monitoring systems including Prometheus, Datadog, CloudWatch, StatsD, Graphite, and more.
Next Steps
- Setup Guide - Learn how to configure metrics collection
- Error Handling - Configure custom error reporting
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.