Как проектировать observability в Ktor: structured logs, request id, metrics и tracing?

Question

Accepted Answer

Observability в Ktor: logs, metrics, tracing

Структурные логи с Request ID

Используйте SLF4J + Logback с JSON-энкодером (например, logstash-logback-encoder) и кладите requestId в MDC для корреляции:

// build.gradle.kts
dependencies {
    implementation("io.ktor:ktor-server-call-logging-jvm:2.3.11")
    implementation("net.logstash.logback:logstash-logback-encoder:7.4")
}

// logback.xml

  
    
      requestId
      userId

// Application.kt
install(CallLogging) {
    level = Level.INFO
    mdc("requestId") { call ->
        call.request.header("X-Request-Id") ?: UUID.randomUUID().toString()
    }
    mdc("userId") { call ->
        call.principal()?.payload?.getClaim("sub")?.asString()
    }
    format { call ->
        val duration = call.processingTimeMillis()
        "${call.request.httpMethod.value} ${call.request.uri} " +
        "-> ${call.response.status()} in ${duration}ms"
    }
}

Метрики через Micrometer + Prometheus

dependencies {
    implementation("io.ktor:ktor-server-metrics-micrometer-jvm:2.3.11")
    implementation("io.micrometer:micrometer-registry-prometheus:1.12.5")
}

val prometheusRegistry = PrometheusMeterRegistry(PrometheusConfig.DEFAULT)

install(MicrometerMetrics) {
    registry = prometheusRegistry
    distributionStatisticConfig = DistributionStatisticConfig.Builder()
        .percentilesHistogram(true)
        .percentiles(0.5, 0.95, 0.99)
        .build()
    // Кастомные теги для всех метрик
    meterBinders = listOf(
        ClassLoaderMetrics(),
        JvmMemoryMetrics(),
        JvmGcMetrics(),
        ProcessorMetrics()
    )
}

// Prometheus scrape endpoint
routing {
    get("/metrics") {
        call.respondText(prometheusRegistry.scrape(), ContentType.Text.Plain)
    }
}

Distributed Tracing через OpenTelemetry

dependencies {
    implementation("io.opentelemetry:opentelemetry-api:1.38.0")
    implementation("io.opentelemetry:opentelemetry-sdk:1.38.0")
    implementation("io.opentelemetry:opentelemetry-exporter-otlp:1.38.0")
    implementation("io.opentelemetry.instrumentation:opentelemetry-ktor-2.0:2.4.0-alpha")
}

val tracer: Tracer = GlobalOpenTelemetry.getTracer("ktor-app")

// Плагин для Ktor — автоматическая инструментация входящих запросов
install(KtorServerTelemetry) {
    setOpenTelemetry(GlobalOpenTelemetry.get())
}

// Ручная инструментация бизнес-логики
suspend fun UserService.getById(id: String): User {
    val span = tracer.spanBuilder("UserService.getById")
        .setAttribute("user.id", id)
        .startSpan()
    return span.makeCurrent().use {
        try {
            repository.findById(id)
        } catch (e: Exception) {
            span.recordException(e)
            span.setStatus(StatusCode.ERROR)
            throw e
        } finally {
            span.end()
        }
    }
}

Health check эндпоинты

routing {
    get("/health/live") {
        call.respond(HttpStatusCode.OK, mapOf("status" to "UP"))
    }
    get("/health/ready") {
        val dbOk = runCatching { dataSource.connection.close() }.isSuccess
        val status = if (dbOk) HttpStatusCode.OK else HttpStatusCode.ServiceUnavailable
        call.respond(status, mapOf("db" to dbOk))
    }
}

Подводные камни

MDC в coroutine теряется при переключении контекста — используйте MDCContext() из kotlinx-coroutines-slf4j: withContext(Dispatchers.IO + MDCContext()) { ... }.
Метрики за /metrics без auth доступны всем — закрывайте эндпоинт через network policy или Basic Auth.
Prometheus scrape по умолчанию каждые 15 секунд — слишком редко для SLO P99 под нагрузкой; снижайте до 5 секунд.
OpenTelemetry Java agent (javaagent JAR) конфликтует с ручной инструментацией — выбирайте один подход.
Logback async appender (AsyncAppender) может терять последние логи при shutdown без includeCallerData=true и явного stop().
CallLogging не логирует тело запроса по умолчанию — добавляйте только для debug и с ограничением размера.
Гистограммы Micrometer percentilesHistogram(true) создают много временных рядов — согласуйте с командой ops.
Span ID не пробрасывается в Logback автоматически — нужна явная интеграция через OpenTelemetryAppender.

Как проектировать observability в Ktor: structured logs, request id, metrics и tracing?

Observability в Ktor: logs, metrics, tracing

Структурные логи с Request ID

Метрики через Micrometer + Prometheus

Distributed Tracing через OpenTelemetry

Health check эндпоинты

Подводные камни

Common mistakes

What the interviewer is testing

Sources

Related topics