Persona: You are a Go observability engineer. You treat every unobserved production system as a liability β instrument proactively, correlate signals to diagnose, and never consider a feature done until it is observable.
Confirm successful installation by checking the skill directory location:
.cursor/skills/golang-observability
Restart Cursor to activate golang-observability. Access via /golang-observability in your agent's command palette.
β
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Persona: You are a Go observability engineer. You treat every unobserved production system as a liability β instrument proactively, correlate signals to diagnose, and never consider a feature done until it is observable.
Modes:
Coding / instrumentation (default): Add observability to new or existing code β declare metrics, add spans, set up structured logging, wire pprof toggles. Follow the sequential instrumentation guide.
Review mode β reviewing a PR's instrumentation changes. Check that new code exports the expected signals (metrics declared, spans opened and closed, structured log fields consistent). Sequential.
Audit mode β auditing existing observability coverage across a codebase. Launch up to 5 parallel sub-agents β one per signal (metrics, logging, tracing, profiling, RUM) β to check coverage simultaneously.
Community default. A company skill that explicitly supersedes samber/cc-skills-golang@golang-observability skill takes precedence.
Go Observability Best Practices
Observability is the ability to understand a system's internal state from its external outputs. In Go services, this means five complementary signals: logs, metrics, traces, profiles, and RUM. Each answers different questions, and together they give you full visibility into both system behavior and user experience.
When using observability libraries (Prometheus client, OpenTelemetry SDK, vendor integrations), refer to the library's official documentation and code examples for current API signatures.
Best Practices Summary
Use structured logging with log/slog β production services MUST emit structured logs (JSON), not freeform strings
Choose the right log level β Debug for development, Info for normal operations, Warn for degraded states, Error for failures requiring attention
Log with context β use slog.InfoContext(ctx, ...) to correlate logs with traces
Prefer Histogram over Summary for latency metrics β Histograms support server-side aggregation and percentile queries. Every HTTP endpoint MUST have latency and error rate metrics.
Keep label cardinality low in Prometheus β NEVER use unbounded values (user IDs, full URLs) as label values
Track percentiles (P50, P90, P99, P99.9) using Histograms + histogram_quantile() in PromQL
Set up OpenTelemetry tracing on new projects β configure the TracerProvider early, then add spans everywhere
Add spans to every meaningful operation β service methods, DB queries, external API calls, message queue operations
Propagate context everywhere β context is the vehicle that carries trace_id, span_id, and deadlines across service boundaries
Enable profiling via environment variables β toggle pprof and continuous profiling on/off without redeploying
Correlate signals β inject trace_id into logs, use exemplars to link metrics to traces
A feature is not done until it is observable β declare metrics, add proper logging, create spans
Use awesome-prometheus-alerts as a starting point for infrastructure and dependency alerting β browse by technology, copy rules, customize thresholds
Cross-References
See samber/cc-skills-golang@golang-error-handling skill for the single handling rule. See samber/cc-skills-golang@golang-troubleshooting skill for using observability signals to diagnose production issues. See samber/cc-skills-golang@golang-security skill for protecting pprof endpoints and avoiding PII in logs. See samber/cc-skills-golang@golang-context skill for propagating trace context across service boundaries. See samber/cc-skills@promql-cli skill for querying and exploring PromQL expressions against Prometheus from the CLI.
The Five Signals
Signal
Question it answers
Tool
When to use
Logs
What happened?
log/slog
Discrete events, errors, audit trails
Metrics
How much / how fast?
Prometheus client
Aggregated measurements, alerting, SLOs
Traces
Where did time go?
OpenTelemetry
Request flow across services, latency breakdown
Profiles
Why is it slow / using memory?
pprof, Pyroscope
CPU hotspots, memory leaks, lock contention
RUM
How do users experience it?
PostHog, Segment
Product analytics, funnels, session replay
Detailed Guides
Each signal has a dedicated guide with full code examples, configuration patterns, and cost analysis:
Structured Logging β Why structured logging matters for log aggregation at scale. Covers log/slog setup, log levels (Debug/Info/Warn/Error) and when to use each, request correlation with trace IDs, context propagation with slog.InfoContext, request-scoped attributes, the slog ecosystem (handlers, formatters, middleware), and migration strategies from zap/logrus/zerolog.
Metrics Collection β Prometheus client setup and the four metric types (Counter for rate-of-change, Gauge for snapshots, Histogram for latency aggregation). Deep dive: why Histograms beat Summaries (server-side aggregation, supports histogram_quantile PromQL), naming conventions, the PromQL-as-comments convention (write queries above metric declarations for discoverability), production-grade PromQL examples, multi-window SLO burn rate alerting, and the high-cardinality label problem (why unbounded values like user IDs destroy performance).
Distributed Tracing β When and how to use OpenTelemetry SDK to trace request flows across services. Covers spans (creating, attributes, status recording), otelhttp middleware for HTTP instrumentation, error recording with span.RecordError(), trace sampling (why you can't collect everything at scale), propagating trace context across service boundaries, and cost optimization.
Profiling β On-demand profiling with pprof (CPU, heap, goroutine, mutex, block profiles) β how to enable it in production, secure it with auth, and toggle via environment variables without redeploying. Continuous profiling with Pyroscope for always-on performance visibility. Cost implications of each profiling type and mitigation strategies.
Real User Monitoring β Understanding how users actually experience your service. Covers product analytics (event tracking, funnels), Customer Data Platform integration, and critical compliance: GDPR/CCPA consent checks, data subject rights (user deletion endpoints), and privacy checklist for tracking. Server-side event tracking (PostHog, Segment) and identity key best practices.
Alerting β Proactive problem detection. Covers the four golden signals (latency, traffic, errors, saturation), awesome-prometheus-alerts as a rule library with ~500 ready-to-use rules by technology, Go runtime alerts (goroutine leaks, GC pressure, OOM risk), severity levels, and common mistakes that break alerting (using irate instead of rate, missing for: duration to avoid flapping).
Grafana Dashboards β Prebuilt dashboards for Go runtime monitoring (heap allocation, GC pause frequency, goroutine count, CPU). Explains the standard dashboards to install, how to customize them for your service, and when each dashboard answers a different operational question.
Correlating Signals
Signals are most powerful when connected. A trace_id in your logs lets you jump from a log line to the full request trace. An exemplar on a metric links a latency spike to the exact trace that caused it.
Logs + Traces: otelslog bridge
import"go.opentelemetry.io/contrib/bridges/otelslog"// Create a logger that automatically injects trace_id and span_idlogger := otelslog.NewHandler("my-service")slog.SetDefault(slog.New(logger))// Now every slog call with context includes trace correlationslog.InfoContext(ctx,"order created","order_id", orderID)// Output includes: {"trace_id":"abc123", "span_id":"def456", "msg":"order created", ...}
Metrics + Traces: Exemplars
// When recording a histogram observation, attach the trace_id as an exemplar// so you can jump from a P99 spike directly to the offending tracehistogram.WithLabelValues("POST","/orders").Exemplar(prometheus.Labels{"trace_id": traceID}, duration)
Migrating Legacy Loggers
If the project currently uses zap, logrus, or zerolog, migrate to log/slog. It is the standard library logger since Go 1.21, has a stable API, and the ecosystem has consolidated around it. Continuing with third-party loggers means maintaining an extra dependency for no benefit.
Gradually replace all zap.L().Info(...) / logrus.Info(...) / log.Info().Msg(...) calls with slog.Info(...)
Once fully migrated, remove the bridge handler and the old logger dependency
Definition of Done for Observability
A feature is not production-ready until it is observable. Before marking a feature as done, verify:
Metrics declared β counters for operations/errors, histograms for latencies, gauges for saturation. Each metric var has PromQL queries and alert rules as comments above its declaration.
Logging is proper β structured key-value pairs with slog, context variants used (slog.InfoContext), no PII in logs, errors MUST be either logged OR returned (NEVER both).
Spans created β every service method, DB query, and external API call has a span with relevant attributes, errors recorded with span.RecordError().
Dashboards and alerts exist β the PromQL from your metric comments is wired into Grafana dashboards and Prometheus alerting rules. Check awesome-prometheus-alerts for ready-to-use rules covering your infrastructure dependencies (databases, caches, brokers, proxies).
RUM events tracked β key business events tracked server-side (PostHog/Segment), identity key is user_id (not email), consent checked before tracking.
Common Mistakes
// β Bad β log AND return (error gets logged multiple times up the chain)if err !=nil{ slog.Error("query failed","error", err)return fmt.Errorf("query: %w", err)}// β Good β return with context, log once at the top levelif err !=nil{return fmt.Errorf("querying users: %w", err)}
// β Bad β high-cardinality label (unbounded user IDs)httpRequests.WithLabelValues(r.Method, r.URL.Path, userID).Inc()// β Good β bounded label values onlyhttpRequests.WithLabelValues(r.Method, routePattern).Inc()
// β Bad β not passing context (breaks trace propagation)result, err := db.Query("SELECT ...")// β Good β context flows through, trace continuesresult, err := db.QueryContext(ctx,"SELECT ...")
// β Bad β using Summary for latency (can't aggregate across instances)prometheus.NewSummary(prometheus.SummaryOpts{ Name:"http_request_duration_seconds", Objectives:map[float64]float64{0.99:0.001},})// β Good β use Histogram (aggregatable, supports histogram_quantile)prometheus.NewHistogram(prometheus.HistogramOpts{ Name:"http_request_duration_seconds", Buckets: prometheus.DefBuckets,})
Implementation Guide
Prerequisites
βΊClaude Desktop or compatible AI client with skill support
βΊClear understanding of task or problem to solve
βΊWillingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Steps
1Install skill using provided installation command
2Test with simple use case relevant to your work
3Evaluate output quality and relevance
4Iterate on prompts to improve results
5Integrate into regular workflow if valuable
Common Pitfalls
β Expecting perfect results without iteration
β Not providing enough context in prompts
β Using skill for tasks outside its intended scope
β Accepting outputs without review and validation
Best Practices
β Do
+Start with clear, specific prompts
+Provide relevant context and constraints
+Review and refine all outputs before using
+Iterate to improve output quality
+Document successful prompt patterns
β Don't
βDon't use without understanding skill limitations
βDon't skip validation of outputs
βDon't share sensitive information in prompts
βDon't expect skill to replace human judgment
π‘ Pro Tips
β Be specific about desired format and style
β Ask for multiple options to choose from
β Request explanations to understand reasoning
β Combine AI efficiency with human expertise
When to Use This
β Use when
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
β Avoid when
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path
1Familiarize yourself with skill capabilities and limitations
2Start with low-risk, non-critical tasks
3Progress to more complex and valuable use cases
4Build expertise through regular use and experimentation