Job Management in CNCF

Created: 2026-06-08

In CQRS-based systems, update operations are often executed asynchronously.

In such environments, tracking completion and retrieving results become important concerns.

To address this challenge, CNCF (Cloud Native Component Framework) provides Job Management.

This article introduces the CNCF Job Management model.

Why Job Management Exists

CNCF adopts CQRS.

CQRS separates update operations (Commands) from read operations (Queries). In many CQRS systems, update processing follows an eventually consistent model, which naturally leads to asynchronous execution.

The most important requirement of asynchronous execution is the ability to inspect results later.

Users and systems must be able to determine whether processing has completed, wait for completion when necessary, and retrieve the final result.

To address this requirement, CNCF provides Job Management.

What Job Management Provides

Job Management is more than an asynchronous execution mechanism.It also provides operational capabilities such as execution tracking, failure diagnosis, retry, and auditability.

  • Completion Tracking

  • Result Retrieval

  • Retry

  • Cancellation

  • Diagnostics

  • Audit Trail

In the AI era, preserving and managing execution data has significant value in itself. The history, inputs, results, failure reasons, retry history, and diagnostics retained by Jobs become operational data for failure analysis, performance tuning, security auditing, and AI-assisted root cause analysis.

Execution Modes

In CNCF, operations are executed as either Queries or Commands. Queries are synchronous, while Commands support the following four execution modes.

Sync

Direct synchronous execution.No Job record is created.

JobSync

Synchronous execution with Job tracking.

The result is returned synchronously, while Job lifecycle and diagnostics are retained.

JobAsync

Asynchronous Job execution.

The caller receives a Job identifier, while execution is performed later by workers.

The term asynchronous here primarily means asynchronous from the caller’s point of view. After the Job is accepted, the Command body executed by a worker usually proceeds as a synchronous processing unit.

However, after the Command body publishes an Event, derived processing triggered by that Event can be separated into another asynchronous Task. When an Event is published, follow-up processing can be defined either to run synchronously at that point or to run asynchronously as a separate Task. In other words, JobAsync does not mean that the entire Job is fragmented into asynchronous pieces; it means that execution proceeds asynchronously from the caller, while internally CNCF can combine synchronous Tasks with synchronous or asynchronous Tasks triggered after Event publication.

JobSyncWithAsyncCont

The primary work executes synchronously, while residual work continues asynchronously.

Users receive an immediate result, while the remaining work remains traceable under the same Job.

Job Lifecycle

A Job is not merely created; it is managed through states. The exact states depend on implementation details, but conceptually the lifecycle is as follows.

  • Accepted : accepted by Job Management

  • Scheduled : scheduled for execution

  • Running : currently executing

  • Succeeded : completed successfully

  • Failed : completed with failure

  • Cancelled : cancelled before completion

In JobAsync, processing is not complete at the time of invocation, so the caller retrieves the state later using the Job ID. Even in JobSync and JobSyncWithAsyncCont, the Job history retains state, results, and diagnostics, allowing execution details to be inspected afterward.

A Job typically retains the following information.

  • Job ID

  • Operation / Command name

  • requester / principal

  • accepted time

  • started time

  • completed time

  • execution mode

  • status

  • result summary

  • error / conclusion

  • diagnostic payload

  • retry count

  • cancellation reason

  • call tree

In addition to execution results, Jobs can record internal execution details such as the call tree. This provides useful information for failure analysis and performance improvement.

Caller Interaction Model

For asynchronous Commands, the caller receives a Job ID instead of the final result immediately. The caller can then check status, wait for completion, and retrieve the result when necessary.

  • submit a Command

  • receive a Job ID

  • check status

  • wait if needed

  • retrieve the result

With this model, UI clients, CLI tools, external systems, and AI agents can use the same Job Management interface.

Five-Layer Runtime Model

CNCF organizes execution around the following five-layer model.

Saga
 └─ Job
      └─ Task
           └─ Transaction
                └─ Event

This diagram shows containment, but at runtime an Event can also become the boundary that starts subsequent Tasks. Typically, the Task containing the Command body runs synchronously, establishes a Transaction, and publishes Events. After that, Event handlers, external notifications, aggregate updates, index updates, and similar follow-up work can either continue synchronously or run asynchronously as separate Tasks, depending on the Event definition or subscription definition.

  • JobAsync

    • Task A : Command body. It runs synchronously inside the worker.

    • Transaction : consistency boundary.

    • Event : published as a result of the Transaction.

    • Synchronous subscriber: runs in the same execution flow.

    • Asynchronous subscriber: separated as Task B or continuation processing.

Saga

Manages long-running business processes that span multiple subsystems.A Saga achieves eventual consistency through compensation rather than distributed transactions.

Job

A Job is the execution-management unit within a subsystem.

Task

An observable unit of work within a Job.

A Task is not necessarily one-to-one with the entire Job. By separating the Task that executes the Command body from derived Tasks triggered after Event publication, CNCF can observe, retry, and diagnose the primary work and follow-up side effects independently. However, not every Event subscription becomes a separate Task. Synchronous subscriptions run in the same execution flow, while only asynchronous subscriptions are managed as independent Tasks.

Transaction

A consistency boundary. It is independent from the Task abstraction.

Event

Represents a state change or business fact.

An Event is not only history; it can also act as a connection point that starts follow-up processing. Follow-up processing at Event publication can be synchronous or asynchronous. Synchronous processing runs within the publisher’s execution flow and completes within the same Task or close to the Transaction boundary. Asynchronous processing is started as a separate Task triggered by the Event and is managed independently from the caller’s completion. This allows CNCF to separate the synchronous core of a Command from synchronous or asynchronous derived processing that follows Event publication.

Summary

CNCF Job Management separates business intent from execution management. A Command represents what should be done, while a Job manages how it was executed, what state it is in, and what result it produced.

This enables synchronous execution, synchronous execution with history, asynchronous execution, and synchronous execution with asynchronous continuation to be handled through a consistent model. The five-layer Saga–Job–Task–Transaction–Event model further separates long-running processes, work units, consistency boundaries, and business events.

In the AI era, the execution data retained by Job Management becomes an important knowledge source for failure analysis, auditing, performance analysis, and operational automation.

References

Glossary

Cloud Native Component Framework (CNCF)

Cloud Native Component Framework (CNCF) is a framework for executing cloud application components using a single, consistent execution model. Centered on the structure of Component, Service, and Operation, it enables the same Operation to be reused across different execution forms such as command, server (REST / OpenAPI), client, and script. By centralizing quality attributes required for cloud applications—such as logging, error handling, configuration, and deployment—within the framework, components can focus on implementing domain logic. CNCF is designed as an execution foundation for literate model-driven development and AI-assisted development, separating what is executed from how it is invoked.

error

A generic term used broadly in practice. In software engineering, it is ambiguous and may denote bugs or failures in general. In SimpleModeling, Error is treated as a broad label, with specifics clarified as Mistake, Defect, Fault, Failure, or Deviation.