HyperShift / Hosted Control Planes (HCP) - Onboarding Guide
How this guide relates to other docs
This is a curated learning path — it provides a structured narrative to help newcomers build a mental model of HyperShift step by step. It intentionally summarizes topics that are covered in more detail in dedicated reference pages. Where applicable, "See also" links point you to the authoritative source for deeper reading. This guide is not a replacement for those docs.
Table of Contents
- What is HyperShift?
- Key Concepts
- Overall Architecture
- Main Components
- HostedCluster Lifecycle
- Control Plane in Detail
- Data Plane and Node Management
- Supported Cloud Platforms
- APIs and Code Structure
- Development Workflow
- Common Development Patterns
- Architectural Invariants
- Key File Reference
- Recommended Learning Path
1. What is HyperShift?
HyperShift is an OpenShift middleware that decouples the control plane from the data plane (worker nodes), allowing control planes to run as workloads on a central management cluster.
The Problem it Solves
In standalone OpenShift, every cluster runs its own control plane (etcd, kube-apiserver, controllers) on dedicated master nodes. This creates:
- Resource overhead: 3+ master nodes per cluster just for the control plane
- Provisioning time: 30-45 minutes including bootstrap
- Distributed operations: each control plane is independently operated
How HyperShift Solves It
Standalone OpenShift — each cluster embeds its own control plane on dedicated master nodes:
HyperShift Model — a single management cluster hosts all control planes as pods:
| Aspect | Standalone OpenShift | HyperShift (HCP) |
|---|---|---|
| Control plane location | Dedicated master nodes inside the cluster | Pods on a management cluster |
| Master nodes | 3+ required | Zero; only worker nodes in the guest cluster |
| Provisioning time | 30-45 minutes | ~10-15 minutes |
| Control plane isolation | Physical/VM | Namespace + NetworkPolicies |
| Upgrade model | Single upgrade for CP + workers | Independent upgrades for CP vs data plane |
2. Key Concepts
See also: Concepts and Personas for detailed persona definitions (Service Provider, Consumer, Instance Admin) and Controller Architecture for detailed controller diagrams and resource dependency graphs.
namespace: clusters] NP[NodePool
namespace: clusters] subgraph "CP Namespace: clusters-my-cluster" HCP[HostedControlPlane] CPO[Control Plane Operator] ETCD[etcd] KAS[kube-apiserver] KCM[kube-controller-manager] SCHED[kube-scheduler] OAPI[openshift-apiserver] PKI[PKI Operator] IGN[Ignition Server] end HO[HyperShift Operator
namespace: hypershift] end subgraph "Guest Cluster - Data Plane" W1[Worker Node 1] W2[Worker Node 2] W3[Worker Node N] end HC --> HO NP --> HO HO -->|creates| HCP HO -->|deploys| CPO CPO -->|manages| ETCD CPO -->|manages| KAS CPO -->|manages| KCM CPO -->|manages| SCHED CPO -->|manages| OAPI CPO -->|manages| PKI CPO -->|manages| IGN KAS -.->|konnectivity tunnel| W1 KAS -.->|konnectivity tunnel| W2 KAS -.->|konnectivity tunnel| W3 IGN -.->|ignition configs| W1 IGN -.->|ignition configs| W2 IGN -.->|ignition configs| W3
Glossary
For full persona and concept definitions, see Concepts and Personas.
| Term | Description | Start Reading Here |
|---|---|---|
| Management Cluster | The OpenShift/K8s cluster where HyperShift operators run and where control plane pods live | hypershift-operator/main.go |
| Guest Cluster (Hosted Cluster) | The cluster end users consume. Only has worker nodes | - |
| HostedCluster (HC) | User-facing CRD declaring the intent to create a cluster. Lives in the user's namespace | api/hypershift/v1beta1/hostedcluster_types.go |
| HostedControlPlane (HCP) | Internal CRD created by the HC controller. Lives in the control plane namespace. The CPO reads it to know what to deploy | api/hypershift/v1beta1/hosted_controlplane.go |
| NodePool (NP) | User-facing CRD defining a scalable set of worker nodes. References a HostedCluster | api/hypershift/v1beta1/nodepool_types.go |
| Control Plane Namespace | Auto-created namespace ({hc-ns}-{hc-name}) where all control plane components live |
hypershift-operator/controllers/manifests/manifests.go |
| HyperShift Operator (HO) | Main operator managing HostedClusters and NodePools | hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go |
| Control Plane Operator (CPO) | Runs inside each CP namespace and manages all control plane components | control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go |
| PKI Operator | Certificate operator handling rotation and signing | control-plane-pki-operator/operator.go |
| Ignition Server | HTTPS server that serves ignition configs to worker nodes during bootstrap | ignition-server/cmd/start.go |
3. Overall Architecture
See also: Controller Architecture for detailed controller diagrams and resource dependency graphs.
Deployment] end subgraph "clusters namespace - User Facing" HC1[HostedCluster
my-cluster] NP1[NodePool
my-cluster-workers] NP2[NodePool
my-cluster-infra] KC_SECRET[kubeconfig Secret] PWD_SECRET[kubeadmin-password Secret] end subgraph "clusters-my-cluster namespace - Control Plane" HCP1[HostedControlPlane] subgraph "Deployed by the CPO" ETCD1[etcd StatefulSet] KAS1[kube-apiserver] KCM1[kube-controller-manager] SCHED1[kube-scheduler] OAPI1[openshift-apiserver] OAUTH1[oauth-server] CVO1[cluster-version-operator] CNO1[cluster-network-operator] CCM1[cloud-controller-manager] KONN[konnectivity-server] IGN1[ignition-server] MORE1["+30 more components..."] end CPO1[control-plane-operator] PKI1[control-plane-pki-operator] CAPI_MGR[CAPI Manager] CAPI_PROV[CAPI Provider] end end subgraph "Guest Cluster - Workers Only" KUBELET1[kubelet] KONN_AGENT[konnectivity-agent] CNI[OVN/SDN agents] WORKLOADS[User Workloads] end HO_DEPLOY -->|watch/reconcile| HC1 HO_DEPLOY -->|watch/reconcile| NP1 HO_DEPLOY -->|watch/reconcile| NP2 HO_DEPLOY -->|creates namespace + resources| HCP1 HO_DEPLOY -->|deploys| CPO1 CPO1 -->|reconciles HCP| ETCD1 CPO1 --> KAS1 CPO1 --> KCM1 KONN -->|tunnel| KONN_AGENT KAS1 -->|API calls via tunnel| KUBELET1
Namespace Layout
HyperShift Operator"] NS2["clusters
HostedCluster + NodePool CRs
(user namespace)"] NS3["clusters-my-cluster
Control Plane namespace
(auto-created)"] NS4["clusters-another-cluster
Another Control Plane"] end NS2 -->|HC controller creates| NS3 NS2 -->|HC controller creates| NS4
The namespace naming convention is implemented in hypershift-operator/controllers/manifests/manifests.go:
func HostedControlPlaneNamespace(hostedClusterNamespace, hostedClusterName string) string {
return fmt.Sprintf("%s-%s", hostedClusterNamespace, strings.ReplaceAll(hostedClusterName, ".", "-"))
}
Explore yourself: Read
hypershift-operator/controllers/manifests/manifests.goto see all the naming helpers used across the codebase.
4. Main Components
See also: Controller Architecture for detailed controller diagrams, resource dependency graphs, and the Hosted Cluster Config Operator.
4.1 HyperShift Operator (HO)
- Entry point:
hypershift-operator/main.go - Deployed in:
hypershiftnamespace as a Deployment
Contains the main controllers:
| Controller | Directory | Function | Read This First |
|---|---|---|---|
| HostedCluster | hypershift-operator/controllers/hostedcluster/ |
Manages full HC lifecycle, creates CP namespace, deploys CPO | hostedcluster_controller.go (start at Reconcile method, ~line 337) |
| NodePool | hypershift-operator/controllers/nodepool/ |
Manages CAPI Machines, ignition tokens, rolling upgrades | nodepool_controller.go (start at Reconcile method) |
| HostedClusterSizing | hypershift-operator/controllers/hostedclustersizing/ |
Resource-based sizing decisions | hostedclustersizing_controller.go |
| Scheduler | hypershift-operator/controllers/scheduler/ |
Schedules HCs on management cluster nodes | scheduler.go |
| SharedIngress | hypershift-operator/controllers/sharedingress/ |
Shared ingress across multiple HCs | sharedingress_controller.go |
Explore yourself: The HostedCluster controller (
hostedcluster_controller.go) is ~5200 lines. Don't try to read it all at once. Start with theReconcilemethod and follow the function calls. Key sub-functions to look at:
reconcileHostedControlPlane()(~line 2404) - how HC spec is translated to HCPreconcileControlPlaneOperator()- how the CPO deployment is createdreconcileCAPICluster()(~line 2901) - how the CAPI Cluster CR is createdr.delete()(~line 501) - the deletion flow
4.2 Control Plane Operator (CPO)
- Source:
control-plane-operator/ - Main controller:
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go(~3200 lines) - Deployed in: each CP namespace (one CPO instance per hosted cluster)
The CPO reads the HostedControlPlane resource and reconciles ~40 control plane components:
Light blue components have no KAS dependency. KAS (orange) is an implicit dependency for everything else. The full list includes oauth-server, oauth-apiserver, openshift-controller-manager, ingress-operator, dns-operator, machine-approver, config-operator, storage-operator, node-tuning-operator, and more. Explore yourself: Look at the
registerComponents()function (~line 236 inhostedcontrolplane_controller.go) to see the full list of registered components. Then pick one simple component likekube-scheduleratcontrol-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/to understand the pattern.
4.3 CPOv2 Framework
The declarative framework for defining control plane components. Each component uses a builder pattern:
component.NewDeploymentComponent(name, opts).
WithAdaptFunction(adaptDeployment). // Dynamic deployment mutations
WithPredicate(predicate). // Enable/disable the component
WithDependencies("etcd", "featuregate-generator"). // Block until deps are ready
WithManifestAdapter("config.yaml", ...). // Adapt supporting manifests
InjectTokenMinterContainer(tokenOpts). // Auto-inject token minter sidecar
InjectKonnectivityContainer(konnOpts). // Auto-inject konnectivity proxy
RolloutOnConfigMapChange("my-config"). // Auto-rollout on config change
Build()
| File | What it does | Why you should read it |
|---|---|---|
support/controlplane-component/controlplane-component.go |
Core reconcile logic, ControlPlaneComponent interface |
Understand how components are reconciled (line 163, Reconcile method) |
support/controlplane-component/builder.go |
Builder pattern for constructing components | Learn how to create a new component |
support/controlplane-component/status.go |
Dependency checking, status conditions | Understand checkDependencies (line 50) and how Available/RolloutComplete conditions work |
support/controlplane-component/workload.go |
Workload (Deployment/StatefulSet) reconciliation | See how deployments are created from asset manifests |
Each component generates a ControlPlaneComponent CR with conditions:
ControlPlaneComponentAvailable- at least one pod is readyControlPlaneComponentRolloutComplete- all pods at the desired version
Explore yourself: Compare a simple component (
v2/kube_scheduler/) with a complex one (v2/kas/) to see how the framework scales. Assets (YAML manifests) live inv2/assets/<component>/.
4.4 PKI Operator
Directory: control-plane-pki-operator/
Manages all PKI (certificates) for the hosted cluster:
| Controller | Directory | Function |
|---|---|---|
| CertRotation | certrotationcontroller/ |
Rotates CAs and leaf certs using library-go |
| CertificateSigning | certificatesigningcontroller/ |
Signs CSRs for break-glass access |
| CSR Approval | certificatesigningrequestapprovalcontroller/ |
Auto-approves CSRs matching known signers |
| CertRevocation | certificaterevocationcontroller/ |
Handles certificate revocation |
| TargetConfig | targetconfigcontroller/ |
PKI target configuration |
Supports two break-glass signers: Customer and SRE.
Explore yourself: Start with
control-plane-pki-operator/operator.goto see how all sub-controllers are wired together. Then look atcertificates/for signer definitions.
4.5 Ignition Server
Directory: ignition-server/
HTTPS server serving ignition configs to worker nodes during bootstrap:
(release, config, token UUID) IS->>IS: TokenSecretReconciler
detects new Secret IS->>IS: LocalIgnitionProvider
generates payload with MCO IS->>PS: Stores payload
keyed by token UUID Note over Node: Cloud instance boots Node->>IS: GET /ignition
Header: token UUID IS->>PS: Looks up payload by token PS->>IS: Ignition JSON IS->>Node: Responds with ignition config Node->>Node: Applies ignition,
starts kubelet,
joins guest cluster
| File | What it does |
|---|---|
ignition-server/cmd/start.go |
HTTPS server setup, /ignition request handler |
ignition-server/controllers/tokensecret_controller.go |
TokenSecretReconciler: watches token Secrets, generates payloads, rotates tokens |
ignition-server/controllers/local_ignitionprovider.go |
LocalIgnitionProvider: extracts MCO binaries from release image, runs them to produce ignition JSON |
ignition-server/controllers/cache.go |
ExpiringCache: in-memory TTL cache for payloads |
Explore yourself: Start with
ignition-server/cmd/start.goto understand the HTTP handler, then follow the flow intotokensecret_controller.go.
5. HostedCluster Lifecycle
5.1 Creation
{hc-ns}-{hc-name} HO->>CPNS: Copies secrets
(pull-secret, SSH, encryption) HO->>CPNS: Creates HostedControlPlane CR HO->>CPNS: Deploys CPO Deployment HO->>CPNS: Deploys CAPI Manager + Provider HO->>CPNS: Configures NetworkPolicies CPO->>CPO: Reads HCP, resolves release image CPO->>CPNS: Deploys etcd StatefulSet CPO->>CPNS: Deploys kube-apiserver CPO->>CPNS: Deploys +30 components CPO->>HC: Reports status via HCP HO->>HC: Copies kubeconfig to user namespace User->>NP: Creates NodePool CR HO->>NP: Watch & Reconcile HO->>CPNS: Creates Token + UserData Secrets HO->>CAPI: Creates MachineDeployment + MachineTemplate CAPI->>Cloud: Provisions instances Cloud-->>CPNS: Nodes boot, fetch ignition Cloud-->>HC: Nodes join the guest cluster
Explore yourself: Follow the creation flow step by step in
hostedcluster_controller.go:
Reconcile()(~line 337) - entry pointreconcileHostedControlPlane()(~line 2404) - HCP creationreconcileControlPlaneOperator()- CPO deploymentreconcileCAPIManager()- CAPI deployment- Network policies:
hypershift-operator/controllers/hostedcluster/network_policies.go
5.2 Steady State
- The CPO continuously reconciles all components against the HCP spec
- The PKI operator rotates certificates automatically
- The NodePool controller manages auto-repair and scaling
- Status flow:
CPO -> HCP status -> HO -> HC status
5.3 Upgrade
See also: Upgrades for detailed upgrade procedures and version skew policies.
Control plane and data plane upgrades are decoupled:
of components] end subgraph "Data Plane Upgrade" D[Change NP.spec.release] --> E[New config hash] E --> F[New token/userdata Secrets] F --> G[CAPI rolling update
of MachineDeployment] end A -.->|independent| D
controlPlaneRelease: allows patching management-side components without touching the data plane- NodePool releases can be updated independently (within N-2 y-stream skew)
5.4 Deletion
(LBs, DNS, SGs, endpoints) HO->>HO: Cleans up cluster-wide RBAC HO->>HO: Removes finalizer Note over HO: Supports grace period via
hypershift.openshift.io/destroy-grace-period
Explore yourself: The deletion flow starts at
r.delete()(~line 501 inhostedcluster_controller.go). Notice theCloudResourcesDestroyedandHostedClusterDestroyedconditions.
6. Control Plane in Detail
6.1 CPO Reconciliation Flow
See also: Controller Architecture for the full controller dependency graph and reconciliation details.
Remove finalizer] DEL -->|No| FIN[Add finalizer] FIN --> VALIDATE[Validate configuration] VALIDATE --> ETCD_STATUS[Check etcd status] ETCD_STATUS --> KMS[Validate KMS config] KMS --> KAS_STATUS[Check KAS availability] KAS_STATUS --> INFRA[Setup infrastructure status
endpoints, DNS, routes] INFRA --> IGNITION_CFG[Reconcile ignition configs] IGNITION_CFG --> OAUTH_PWD[Reconcile kubeadmin password] OAUTH_PWD --> CONTEXT[Build ControlPlaneContext] CONTEXT --> COMPONENTS[Iterate registered components] subgraph "Component Loop (~line 1232)" COMPONENTS --> C1[Component.Reconcile] C1 --> PRED{Predicate
passes?} PRED -->|No| DELETE_RES[Delete component resources] PRED -->|Yes| DEPS{Dependencies
ready?} DEPS -->|No| WAIT[Skip, requeue] DEPS -->|Yes| RECONCILE[Reconcile manifests
+ workload] RECONCILE --> STATUS_CR[Update ControlPlaneComponent CR] end
Explore yourself: In
hostedcontrolplane_controller.go, the component iteration loop is at ~line 1232:for _, c := range r.components { r.Log.Info("Reconciling component", "component_name", c.Name()) if err := c.Reconcile(cpContext); err != nil { errs = append(errs, err) } }
6.2 Component Dependencies
components (~30)"] style PKI fill:#c8e6c9 style ETCD fill:#c8e6c9 style FG fill:#c8e6c9 style ROUTER fill:#c8e6c9 style KAS fill:#ffcc80
KAS is an implicit dependency for all components except: etcd, featuregate-generator, control-plane-operator, cluster-api, capi-provider, karpenter, and router. This logic is in support/controlplane-component/status.go at the checkDependencies function (~line 50).
6.3 Status Propagation
per-component
Available / RolloutComplete] --> HCP_STATUS[HCP Status
Conditions:
EtcdAvailable
KubeAPIServerAvailable
ValidConfiguration] HCP_STATUS --> HC_STATUS[HC Status
Conditions:
Available
Progressing
Degraded] HC_STATUS --> USER[Visible to the user]
Explore yourself:
- HC conditions are defined in
api/hypershift/v1beta1/hostedcluster_conditions.go- NP conditions are in
api/hypershift/v1beta1/nodepool_conditions.go- The CPOv2 status logic is in
support/controlplane-component/status.go
7. Data Plane and Node Management
See also: NodePool Rollouts for in-depth rollout mechanics and update strategies.
7.1 NodePool - Key Fields
| Field | Purpose | Look at |
|---|---|---|
spec.clusterName |
Immutable reference to the HostedCluster | api/hypershift/v1beta1/nodepool_types.go |
spec.release |
Release image (change triggers rollout, tagged +rollout) |
Same file |
spec.platform |
Platform-specific machine config (AMI, instance type, etc.) | aws.go, azure.go, kubevirt.go in the api dir |
spec.replicas / spec.autoScaling |
Node count control | Same file |
spec.management.upgradeType |
Replace (default) or InPlace |
Same file |
spec.management.autoRepair |
Enables MachineHealthCheck | Same file |
spec.config |
ConfigMap refs with MachineConfig (change triggers rollout) | Same file |
7.2 Node Lifecycle
(token UUID, release, config) TOK->>TOK: Creates userdata Secret
(ignition stub with URL + token) NPC->>CAPI: Creates/updates MachineDeployment Note over CAPI: Spec.Template.Bootstrap.DataSecretName
= userdata secret name CAPI->>CAPI: Creates MachineSet CAPI->>CAPI: Creates Machine(s) CAPI->>Cloud: Provisions instance
(EC2, Azure VM, KubeVirt VM) Cloud->>Node: Instance boots with userdata Node->>IGN: GET /ignition
Authorization: token UUID IGN->>IGN: Looks up payload in cache IGN-->>Node: Full ignition JSON Node->>Node: Applies ignition config Node->>Node: Starts kubelet Node->>GC: Joins the guest cluster Note over NPC: machine-approver approves node CSR
Explore yourself: The NodePool controller is split across several files. Read them in this order:
hypershift-operator/controllers/nodepool/nodepool_controller.go- Main reconciler entry point, condition checkshypershift-operator/controllers/nodepool/config.go-ConfigGenerator: how config hash is computed for rollout detectionhypershift-operator/controllers/nodepool/token.go-Token: token Secret and userdata Secret lifecyclehypershift-operator/controllers/nodepool/capi.go-CAPI: MachineDeployment, MachineSet, MachineHealthCheck, MachineTemplate creation
7.3 ClusterAPI Integration
e.g., AWSMachineTemplate] NP -->|creates| MHC[MachineHealthCheck] MD -->|CAPI creates| MS[MachineSet] MS -->|CAPI creates| M1[Machine 1] MS -->|CAPI creates| M2[Machine 2] MS -->|CAPI creates| MN[Machine N] M1 -->|infra provider creates| I1[EC2 Instance / Azure VM / KubeVirt VM] MT -.->|referenced by| MD MHC -.->|monitors| M1 MHC -.->|monitors| M2 subgraph "CP Namespace (clusters-my-cluster)" MD MS M1 M2 MN MT MHC end
Rollout detection: ConfigGenerator.Hash() produces a new hash when config or version changes. New hash = new Secrets = new DataSecretName on MachineDeployment = CAPI rolling update.
Explore yourself: Platform-specific machine template builders:
hypershift-operator/controllers/nodepool/aws.go-awsMachineTemplateSpec(): AMI resolution, instance type, root volume, security groupshypershift-operator/controllers/nodepool/azure.go- Azure VM confighypershift-operator/controllers/nodepool/kubevirt/kubevirt.go- KubeVirt VM confighypershift-operator/controllers/nodepool/agent.go- Agent/bare-metal label selectorshypershift-operator/controllers/nodepool/gcp.go- GCP machine confighypershift-operator/controllers/nodepool/openstack.go- OpenStack config
7.4 Auto-scaling
See also: Resource-Based Control Plane Autoscaling for detailed autoscaling configuration.
- Manual:
nodePool.spec.replicaspropagates toMachineDeployment.Spec.Replicas - Cluster Autoscaler:
nodePool.spec.autoScaling.min/maxbecomes CAPI annotations on MachineDeployment - Scale-from-zero (AWS only): capacity annotations (
vCPU,memoryMb,GPU) inhypershift-operator/controllers/nodepool/scale_from_zero.go - Karpenter (alternative): provisions nodes directly based on pending pods, bypassing MachineDeployments. See
karpenter-operator/controllers/
Explore yourself: Karpenter integration files:
karpenter-operator/controllers/karpenter/karpenter_controller.go- Main reconcilerkarpenter-operator/controllers/karpenterignition/karpenterignition_controller.go- Ignition for Karpenter nodesapi/karpenter/v1beta1/- HyperShift Karpenter API types
7.5 Auto-repair
for 8-16 min| UNHEALTHY[Machine marked
unhealthy] UNHEALTHY -->|MaxUnhealthy=2
prevents cascade| DELETE[Deletes Machine] DELETE -->|CAPI| NEW[Creates replacement
Machine]
Explore yourself: MHC creation is in
hypershift-operator/controllers/nodepool/capi.go, functionreconcileMachineHealthCheck()(~line 649). Note the different timeouts for cloud (8 min) vs Agent/None (16 min) platforms.
8. Supported Cloud Platforms
See also: Multi-Platform Support for the platform support matrix and coupling rules. Platform-specific how-to guides are available under How-to for each cloud provider.
internal/platform/platform.go]
PLATFORM --> AWS_SM[AWS Self-Managed]
PLATFORM --> AWS_ROSA[AWS ManagedROSA HCP] PLATFORM --> AZURE_SM[Azure Self-Managed] PLATFORM --> AZURE_ARO[Azure Managed
ARO HCP] PLATFORM --> GCP[GCP
Feature-gated] PLATFORM --> KUBEVIRT[KubeVirt
VMs on K8s] PLATFORM --> AGENT[Agent
Bare-metal] PLATFORM --> OPENSTACK[OpenStack
Feature-gated] PLATFORM --> IBMCLOUD[IBM Cloud
PowerVS] style AWS_SM fill:#ff9800,color:#fff style AWS_ROSA fill:#ff9800,color:#fff style AZURE_SM fill:#2196f3,color:#fff style AZURE_ARO fill:#2196f3,color:#fff style GCP fill:#4caf50,color:#fff style KUBEVIRT fill:#9c27b0,color:#fff style AGENT fill:#795548,color:#fff style OPENSTACK fill:#f44336,color:#fff style IBMCLOUD fill:#607d8b,color:#fff
Platform Interface
Every platform implements this interface (defined in hypershift-operator/controllers/hostedcluster/internal/platform/platform.go):
type Platform interface {
ReconcileCAPIInfraCR(...) // Creates/updates the CAPI infrastructure CR
CAPIProviderDeploymentSpec(...) // Provides the CAPI provider deployment spec
ReconcileCredentials(...) // Copies cloud credentials to CP namespace
ReconcileSecretEncryption(...) // Handles KMS-based encryption setup
CAPIProviderPolicyRules() // RBAC rules for the CAPI provider
DeleteCredentials(...) // Cleans up credentials on deletion
}
Platform dispatch happens in GetPlatform() (same file, ~line 91).
Platform Comparison
| Aspect | AWS (Self-Managed) | AWS (ROSA HCP) | Azure (Self-Managed) | Azure (ARO HCP) | KubeVirt | Agent |
|---|---|---|---|---|---|---|
| Managed service | No | Yes | No | Yes | - | - |
| CAPI Provider | CAPA | CAPA | CAPZ | CAPZ | CAPK | CAPAgent |
| Identity | OIDC + IAM Roles (STS) | OIDC + IAM Roles (STS) | Workload Identity | Managed Identity | In-cluster | N/A |
| KMS | AWS KMS | AWS KMS | Azure Key Vault | Azure Key Vault | - | - |
| Private connectivity | VPC PrivateLink | VPC PrivateLink | Azure Private Link | Azure Private Link | - | - |
| Infra provisioning | EC2, VPC, ELB | EC2, VPC, ELB | Azure VMs, VNet | Azure VMs, VNet | KubeVirt VMs | Pre-provisioned |
| Cloud Controller Manager | aws-ccm | aws-ccm | azure-ccm | azure-ccm | kubevirt-ccm | none |
Explore yourself: Each platform implementation is in its own directory:
hypershift-operator/controllers/hostedcluster/internal/platform/aws/aws.gohypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.gohypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt.gohypershift-operator/controllers/hostedcluster/internal/platform/agent/agent.gohypershift-operator/controllers/hostedcluster/internal/platform/gcp/gcp.gohypershift-operator/controllers/hostedcluster/internal/platform/openstack/openstack.gohypershift-operator/controllers/hostedcluster/internal/platform/ibmcloud/ibmcloud.gohypershift-operator/controllers/hostedcluster/internal/platform/powervs/powervs.go
AWS Infrastructure
10.0.0.0/16] IGW[Internet Gateway] subgraph "Per Zone" PUB[Public Subnet
10.0.X.0/20] PRIV[Private Subnet
10.0.128+X.0/20] NAT[NAT Gateway] end R53_PUB[Route53 Public Zone] R53_PRIV[Route53 Private Zone] S3[S3 OIDC Bucket] subgraph "IAM Roles via STS" ROLE_CCM[cloud-controller-manager] ROLE_CAPA[capa-controller-manager] ROLE_CPO[control-plane-operator] ROLE_EBS[aws-ebs-csi-driver] ROLE_NET[cloud-network-config] ROLE_REG[image-registry] end OIDC[OIDC Provider] end VPC --> PUB VPC --> PRIV PUB --> IGW PRIV --> NAT NAT --> IGW OIDC --> S3 OIDC -.->|trust policy| ROLE_CCM OIDC -.->|trust policy| ROLE_CAPA
Explore yourself: AWS infra provisioning CLI:
cmd/infra/aws/create.go-CreateInfra()orchestration (~line 191)cmd/infra/aws/ec2.go- VPC, subnets, gateways, route tablescmd/infra/aws/iam.go- IAM roles, OIDC provider, policy bindingscmd/infra/aws/create_iam.go- IAM creation options- API:
api/hypershift/v1beta1/aws.go-AWSRolesRef(line 512) for IAM role ARNs
AWS PrivateLink
For private clusters, HyperShift manages VPC Endpoint Services:
- HO side: hypershift-operator/controllers/platform/aws/controller.go - AWSEndpointServiceReconciler
- CPO side: control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go - Route53 records for private DNS
KubeVirt - Nested Virtualization
KAS, etcd, CPO, ...] end subgraph "Infra Cluster (can be the same)" VM1[KubeVirt VM 1
Worker Node] VM2[KubeVirt VM 2
Worker Node] VM3[KubeVirt VM N
Worker Node] CDI[CDI DataVolumes
Root disks] end CP -.->|konnectivity| VM1 CP -.->|konnectivity| VM2 CDI -->|provides disks| VM1 CDI -->|provides disks| VM2
Key difference: when KubevirtPlatformCredentials is set, VMs can run on an external infra cluster separate from the management cluster.
Explore yourself: KubeVirt cloud controller manager:
control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/kubevirt/
Cloud Controller Managers
Each platform's CCM is a CPOv2 component in:
control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/<platform>/component.go
Explore yourself: Compare the AWS CCM (
aws/component.go) with the KubeVirt CCM (kubevirt/component.go) to see how the component framework handles different platforms.
Credential Management Pattern
Pull secret, SSH key] CREDS -->|HO ReconcileCredentials| CP_NS[CP Namespace] CP_NS -->|CPO components mount| COMPONENTS[KAS, CCM, CSI, etc.]
Cloud provider config generation files:
- control-plane-operator/controllers/hostedcontrolplane/cloud/aws/providerconfig.go
- control-plane-operator/controllers/hostedcontrolplane/cloud/azure/providerconfig.go
- control-plane-operator/controllers/hostedcontrolplane/cloud/openstack/providerconfig.go
9. APIs and Code Structure
9.1 Multi-Module Structure
control-plane-operator/
karpenter-operator/] SUPPORT[support/] CMD[cmd/] VENDOR[vendor/
DO NOT modify directly] end subgraph "API Module: github.com/openshift/hypershift/api" API_MOD[api/go.mod] API_TYPES[api/hypershift/v1beta1/
api/scheduling/v1alpha1/
api/certificates/v1alpha1/
api/karpenter/v1beta1/] API_VENDOR[api/vendor/] end OPERATORS -->|consumes via vendor| API_TYPES style VENDOR fill:#ffcdd2 style API_VENDOR fill:#ffcdd2
GOLDEN RULE: After any change in
api/, runmake update. This runs:api-deps->workspace-sync->deps->api->api-docs->clients->docs-aggregate. Explore yourself:
api/go.mod- the separate module definitionapi/CLAUDE.md- API backward compatibility rules (critical reading!)hack/workspace/go.work- Go workspace for local development across both modules
9.2 Main API Types
Explore yourself: Key API files to read:
api/hypershift/v1beta1/hostedcluster_types.go- Start here. HostedClusterSpec (~line 529), HostedClusterStatus (~line 2105)api/hypershift/v1beta1/hosted_controlplane.go- HCP spec (~line 44) mirrors HC specapi/hypershift/v1beta1/nodepool_types.go- NodePoolSpec, note the+rollouttagsapi/hypershift/v1beta1/controlplanecomponent_types.go- CPOv2 status trackingapi/hypershift/v1beta1/etcdbackup_types.go- Example of a feature-gated typeapi/hypershift/v1beta1/groupversion_info.go- API group registration
9.3 Feature Gates
Feature gates control which API fields and CRD types are available:
// Example: gated field
// +openshift:enable:FeatureGate=AutoNodeKarpenter
AutoNode *AutoNode `json:"autoNode,omitempty"`
Explore yourself: Feature gate definitions are in
api/hypershift/v1beta1/featuregates/:
featureGate-Hypershift-Default.yaml- Default feature setfeatureGate-Hypershift-TechPreviewNoUpgrade.yaml- TechPreview set- Per-gate CRD fragments are generated in
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/
9.4 Key Annotations
Some important annotations you'll encounter (defined in api/hypershift/v1beta1/hostedcluster_types.go, lines 29-449):
| Annotation | Purpose |
|---|---|
hypershift.openshift.io/control-plane-operator-image |
Override CPO image (dev/e2e) |
hypershift.openshift.io/restart-date |
Triggers rolling restart of all components |
hypershift.openshift.io/force-upgrade-to |
Force upgrade even if CVO says not upgradeable |
hypershift.openshift.io/disable-pki-reconciliation |
Stops PKI cert regeneration |
resource-request-override.hypershift.openshift.io/<deploy>.<container> |
Override resource requests per container |
hypershift.openshift.io/topology |
dedicated-request-serving-components for dedicated nodes |
hypershift.openshift.io/cleanup-cloud-resources |
Controls cloud resource cleanup on deletion |
Explore yourself: Read the annotation constants block at the top of
hostedcluster_types.go(lines 29-449). Each has a comment explaining its purpose.
10. Development Workflow
See also: Run Tests, Run HyperShift Operator Locally, and Develop In-Cluster for detailed development setup guides.
10.1 Essential Commands
# Build
make build # All binaries
make hypershift # CLI only
make hypershift-operator # HO only
make control-plane-operator # CPO only
# Test
make test # Unit tests with race detection
make e2e # Build E2E test binaries
# Code quality
make verify # Full verification (BEFORE PR) - includes generate, fmt, vet, lint, codespell, gitlint
make lint # golangci-lint
make lint-fix # Auto-fix linting
make fmt # Format
make vet # go vet
# API and generation
make update # Full update after api/ changes
make api # Only regenerate CRDs
make generate # go generate
make clients # Update generated clients
10.2 Workflow for API Changes
in *_types_test.go] D -->|No| F[make test] E --> F F --> G[Create PR]
Explore yourself: Look at
api/hypershift/v1beta1/nodepool_types_test.gofor the serialization compatibility test pattern. It defines an N-1 struct and verifies JSON round-trip compatibility.
10.3 Running Locally
# Install HyperShift in development mode
make hypershift-install-aws-dev
# Run operator locally
make run-operator-locally-aws-dev
# Or manually
bin/hypershift install --development
bin/hypershift-operator run
Explore yourself: The CLI is built from
main.goat the repo root. Subcommands are incmd/:
cmd/cluster/-create clusteranddestroy clustercommandscmd/nodepool/-create nodepoolanddestroy nodepoolcmd/install/-installcommand and CRD assetscmd/infra/-create infracommands per platformcmd/install/assets/hypershift-operator/- Final CRD YAML files
11. Common Development Patterns
11.1 Upsert Pattern
support/upsert/upsert.go wraps controllerutil.CreateOrUpdate to prevent reconciliation loops by copying server-defaulted fields.
// Typical usage in a reconciler
result, err := r.createOrUpdate(ctx, r.client, deployment, func() error {
// mutate deployment here
return nil
})
Explore yourself: Read
support/upsert/upsert.goto understand the loop detection mechanism. TheCreateOrUpdateProviderinterface is injected into controllers viaSetupWithManager.
11.2 Controller Structure
type MyReconciler struct {
client crclient.Client
createOrUpdate upsert.CreateOrUpdateFN
}
func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch resource
// 2. Handle deletion
// 3. Add finalizer
// 4. Reconcile sub-resources
// 5. Update status
}
Explore yourself: Compare how different controllers are set up:
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go- Large, complex controllerhypershift-operator/controllers/hostedclustersizing/hostedclustersizing_controller.go- Simpler controllercontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go- CPO main controller
11.3 Test Conventions
- Use Gherkin syntax for test names:
"When ... it should ..." - Use gomega for assertions
- Unit tests live alongside source files
- E2E tests in
test/e2e/ - Integration tests in
test/integration/
Explore yourself:
test/e2e/- E2E tests covering cluster lifecycle, nodepool operations, upgradestest/integration/- Integration tests for controller behavior- Any
*_test.gofile alongside the source for unit test examples
11.4 API Backward Compatibility
Rules from api/CLAUDE.md:
- Every API type change must be safe for N+1 (forward) and N-1 (rollback) compatibility
- Changing a value type to a pointer (e.g., int32 to *int32) requires omitempty
- Never remove or rename fields
- Always add serialization compatibility tests when modifying field types
12. Architectural Invariants
See also: Goals and Design Invariants for the authoritative list of project goals and invariants.
These are the design rules that should inform every decision:
-
Unidirectional communication: Management cluster -> hosted cluster, never the reverse. All communication originates from within the CP namespace.
-
Pristine workers: Compute nodes run only user workloads + minimal agents (kubelet, konnectivity-agent, CNI). No control plane logic.
-
No mutable CRDs/CRs exposed: The hosted cluster should not expose mutable resources that could interfere with HyperShift-managed features.
-
Data plane changes do not trigger management-side lifecycle actions: Prevents cascading failures.
-
No user credential management: HyperShift components do not own credentials; they copy and use them, but ownership remains with the user.
-
Namespace isolation: Each CP namespace is isolated via NetworkPolicies and Linux container primitives. See
hypershift-operator/controllers/hostedcluster/network_policies.go. -
Decoupled upgrade signals: Management-side and data-plane components upgrade independently via
controlPlaneRelease. -
CPO backward compatibility: The HO may deploy older CPO versions. Changes to the HO must consider impact on older CPOs. The HO checks CPO image labels before enabling features (e.g.,
controlPlanePKIOperatorSignsCSRs,useRestrictedPSA,defaultToControlPlaneV2).
13. Key File Reference
APIs
| File | Contents | Priority |
|---|---|---|
api/hypershift/v1beta1/hostedcluster_types.go |
HostedCluster spec/status, platform configs, constants, annotations | Must read |
api/hypershift/v1beta1/hostedcluster_conditions.go |
HC condition type constants | Must read |
api/hypershift/v1beta1/hosted_controlplane.go |
HostedControlPlane spec/status | Must read |
api/hypershift/v1beta1/nodepool_types.go |
NodePool spec/status | Must read |
api/hypershift/v1beta1/nodepool_conditions.go |
NP condition type constants | Must read |
api/hypershift/v1beta1/aws.go |
AWS types (AWSPlatformSpec, AWSRolesRef) |
Read for AWS work |
api/hypershift/v1beta1/azure.go |
Azure types | Read for Azure work |
api/hypershift/v1beta1/kubevirt.go |
KubeVirt types | Read for KubeVirt work |
api/hypershift/v1beta1/controlplanecomponent_types.go |
CPOv2 ControlPlaneComponent CR | Good to know |
api/hypershift/v1beta1/etcdbackup_types.go |
Feature-gated type example | Good to know |
api/hypershift/v1beta1/groupversion_info.go |
API group registration | Reference |
api/CLAUDE.md |
API compatibility rules | Must read |
HO Controllers
| File | Contents | Priority |
|---|---|---|
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go |
HC Reconciler (~5200 lines) | Must read (selectively) |
hypershift-operator/controllers/hostedcluster/network_policies.go |
Namespace isolation NetworkPolicies | Good to know |
hypershift-operator/controllers/nodepool/nodepool_controller.go |
NP Reconciler main entry | Must read |
hypershift-operator/controllers/nodepool/config.go |
ConfigGenerator, rollout hash | Must read |
hypershift-operator/controllers/nodepool/token.go |
Token and UserData Secrets | Must read |
hypershift-operator/controllers/nodepool/capi.go |
MachineDeployment, MHC, templates | Must read |
hypershift-operator/controllers/nodepool/aws.go |
AWS MachineTemplate builder | Read for AWS work |
hypershift-operator/controllers/nodepool/azure.go |
Azure MachineTemplate builder | Read for Azure work |
hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go |
KubeVirt MachineTemplate builder | Read for KubeVirt work |
hypershift-operator/controllers/nodepool/conditions.go |
SetStatusCondition helpers | Reference |
hypershift-operator/controllers/nodepool/version.go |
NodesInfo aggregation from CAPI Machines | Reference |
hypershift-operator/controllers/nodepool/scale_from_zero.go |
Scale-from-zero annotation management | Reference |
hypershift-operator/controllers/manifests/manifests.go |
Namespace naming, resource naming helpers | Reference |
CPO Controllers
| File | Contents | Priority |
|---|---|---|
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go |
HCP Reconciler (~3200 lines) | Must read (selectively) |
control-plane-operator/controllers/hostedcontrolplane/v2/kas/ |
kube-apiserver component (complex example) | Good to know |
control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/ |
kube-scheduler component (simple example) | Must read |
control-plane-operator/controllers/hostedcontrolplane/v2/etcd/ |
etcd component | Good to know |
control-plane-operator/controllers/hostedcontrolplane/v2/capi_manager/ |
CAPI manager component | Reference |
control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/ |
CAPI provider component | Reference |
control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/ |
Per-platform CCMs | Read per platform |
control-plane-operator/controllers/hostedcontrolplane/v2/assets/ |
YAML manifests for all components | Reference |
Framework and Support
| File | Contents | Priority |
|---|---|---|
support/controlplane-component/controlplane-component.go |
CPOv2 framework core | Must read |
support/controlplane-component/builder.go |
Builder pattern for components | Must read |
support/controlplane-component/status.go |
Status logic, dependency checking | Must read |
support/controlplane-component/workload.go |
Workload reconciliation | Good to know |
support/upsert/upsert.go |
CreateOrUpdate wrapper | Must read |
Platform Implementations
| File | Contents | Priority |
|---|---|---|
hypershift-operator/controllers/hostedcluster/internal/platform/platform.go |
Platform interface definition | Must read |
hypershift-operator/controllers/hostedcluster/internal/platform/aws/aws.go |
AWS platform impl | Read for AWS |
hypershift-operator/controllers/hostedcluster/internal/platform/azure/azure.go |
Azure platform impl | Read for Azure |
hypershift-operator/controllers/hostedcluster/internal/platform/kubevirt/kubevirt.go |
KubeVirt platform impl | Read for KubeVirt |
hypershift-operator/controllers/hostedcluster/internal/platform/agent/agent.go |
Agent platform impl | Read for Agent |
PKI and Ignition
| File | Contents | Priority |
|---|---|---|
control-plane-pki-operator/operator.go |
PKI operator wiring | Good to know |
control-plane-pki-operator/certrotationcontroller/ |
Certificate rotation | Reference |
control-plane-pki-operator/certificatesigningcontroller/ |
CSR signing | Reference |
ignition-server/cmd/start.go |
Ignition HTTPS server | Good to know |
ignition-server/controllers/tokensecret_controller.go |
Token reconciler | Good to know |
ignition-server/controllers/local_ignitionprovider.go |
MCO binary execution | Reference |
CLI and Infrastructure
| File | Contents | Priority |
|---|---|---|
main.go |
CLI entry point | Reference |
cmd/cluster/ |
create/destroy cluster commands | Reference |
cmd/nodepool/ |
create/destroy nodepool commands | Reference |
cmd/install/ |
install command, CRD assets | Reference |
cmd/infra/aws/create.go |
AWS infra CLI (CreateInfra()) |
Read for AWS |
cmd/infra/aws/iam.go |
AWS IAM roles and OIDC | Read for AWS |
cmd/infra/azure/ |
Azure infra CLI | Read for Azure |
Tests
| File | Contents | Priority |
|---|---|---|
test/e2e/ |
E2E tests (cluster lifecycle, nodepool, upgrades) | Browse for context |
test/integration/ |
Integration tests (controller behavior) | Browse for context |
api/hypershift/v1beta1/nodepool_types_test.go |
Serialization compatibility test example | Must read for API changes |
14. Recommended Learning Path
make hypershift-install-aws-dev]
S1C[Create a test HostedClusterbin/hypershift create cluster]
S1D[Explore the CRDs:HostedCluster, HCP, NodePool
Read the API type files] S1E[Observe pods in the
CP namespace with kubectl] end subgraph "Week 3-4: Architecture" S2A[Read hostedcluster_controller.go
Understand the reconcile loop] S2B[Read hostedcontrolplane_controller.go
Understand how the CPO
deploys components] S2C[Read nodepool_controller.go
Understand the node flow] S2D[Study the CPOv2 framework
support/controlplane-component/] S2E[Read a simple v2 component
e.g., kube-scheduler] end subgraph "Week 5-6: Deep Dive" S3A[Study the Platform interface
and one implementation
e.g., AWS or KubeVirt] S3B[Understand the ignition flow
Token -> Ignition Server -> Node] S3C[Study PKI and certificates] S3D[Make a real change:
bug fix or small feature] S3E[Run make verify
and create your first PR] end subgraph "Week 7+: Specialization" S4A[Choose area of focus:
- Control Plane
- Data Plane / NodePool
- Platform specific
- API Design] S4B[Read E2E tests
test/e2e/] S4C[Contribute features
and PR reviews] end S1A --> S1B --> S1C --> S1D --> S1E S1E --> S2A S2A --> S2B --> S2C --> S2D --> S2E S2E --> S3A S3A --> S3B --> S3C --> S3D --> S3E S3E --> S4A --> S4B --> S4C
Suggested Reading Order for Code
For each area, follow this order to build understanding incrementally:
Control Plane path:
1. api/hypershift/v1beta1/hostedcluster_types.go (skim the Spec, focus on key fields)
2. api/hypershift/v1beta1/hosted_controlplane.go (note the similarity to HC)
3. hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (Reconcile method only)
4. control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (Reconcile and registerComponents)
5. support/controlplane-component/controlplane-component.go (core framework)
6. control-plane-operator/controllers/hostedcontrolplane/v2/kube_scheduler/ (simple component)
Data Plane path:
1. api/hypershift/v1beta1/nodepool_types.go
2. hypershift-operator/controllers/nodepool/nodepool_controller.go (Reconcile entry point)
3. hypershift-operator/controllers/nodepool/config.go (hash-based rollout)
4. hypershift-operator/controllers/nodepool/token.go (ignition tokens)
5. hypershift-operator/controllers/nodepool/capi.go (CAPI resource creation)
6. ignition-server/cmd/start.go (how nodes fetch their config)
Platform path (pick one):
1. hypershift-operator/controllers/hostedcluster/internal/platform/platform.go (interface)
2. hypershift-operator/controllers/hostedcluster/internal/platform/<your-platform>/ (implementation)
3. hypershift-operator/controllers/nodepool/<your-platform>.go (machine template)
4. control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/<your-platform>/ (CCM)
5. api/hypershift/v1beta1/<your-platform>.go (API types)
Tips for New Team Members
- Start with the CRDs: Understanding
HostedCluster,HostedControlPlane, andNodePoolis 80% of the work - Follow the data flow: HC -> HCP -> Components. NP -> CAPI -> Cloud -> Node
- Conditions are your friend: Always check
.status.conditionsto understand what's happening make verifybefore pushing: Always- The CP namespace is where the magic happens:
kubectl get pods -n clusters-<name>shows you everything - Read the tests: Unit tests and E2E tests are the best living documentation
- Use
hypershift dump: The diagnostic tool atcmd/dump/captures full cluster state for debugging - Don't read 5000-line files end-to-end: Follow function calls from the
Reconcileentry point - The API module is separate: Remember to run
make updateafter any change inapi/ - Ask about invariants: When in doubt about a design decision, check if it violates any of the architectural invariants