Interpreting KubeVela 1.7: Taking Over Your Existing Workloads

The KubeVela 1.7 version has been officially released for some time, during which KubeVela has been officially promoted to a CNCF incubation project, marking a new milestone. KubeVela 1.7 itself is also a turning point because KubeVela has been focusing on the design of an extensible system from the beginning, and the demand for the core functionality of controllers has gradually converged, freeing up more resources to focus on user experience, ease of use, and performance. In this article, we will focus on highlighting the prominent features of version 1.7, such as workload takeover and performance optimization.

Taking Over Your Existing Workloads

Taking over existing workloads has always been a highly demanded requirement within the community, with a clear scenario: existing workloads can be naturally migrated to the OAM standard system and be managed uniformly by KubeVela's application delivery control plane. The workload takeover feature also allows reuse of VelaUX's UI console functions, including a series of operations and maintenance characteristics, workflow steps, and a rich plugin ecosystem. In version 1.7, we officially released this feature. Before diving into the specific operation details, let's first have a basic understanding of its operation mode.

"read-only" and "take-over" policy

To meet the needs of different usage scenarios, KubeVela provides two modes for unified management. One is the "read-only" mode, which is suitable for systems that already have a self-built platform internally and still have the main control capability for existing businesses. The new KubeVela-based platform system can only observe these applications in a read-only manner. The other mode is the "take-over" mode, which is suitable for users who want to directly migrate their workloads to the KubeVela system and achieve complete unified management.

The "read-only" mode, as the name suggests, does not perform any "write" operations on resources. Workloads managed in read-only mode can be visualized through KubeVela's toolset (such as CLI and VelaUX), which satisfies the need for unified viewing and observability. At the same time, when the managed application generated under read-only mode is deleted, the underlying workload resources will not be reclaimed. If the underlying workload is artificially modified by other controllers, KubeVela can also observe these changes.
The "take-over" mode means that the underlying workloads will be fully managed by KubeVela, just like other workloads created directly through the KubeVela system. The update, deletion, and other lifecycle of the workloads will be fully controlled by KubeVela's application system. By default, modifications to workloads by other systems will no longer take effect and will be changed back by KubeVela's end-state control loop, unless you add other management policies (such as apply-once).

The method of declaring the take-over mode uses KubeVela's policy system, as shown below:

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: read-only
spec:
  components:
    - name: nginx
      type: webservice
      properties:
        image: nginx
  policies:
    - type: read-only
      name: read-only
      properties:
        rules:
          - selector:
              resourceTypes: ["Deployment"]

In the "read-only" policy, we have defined multiple read-only rules. For example, if the read-only selector hits a "Deployment" resource, it means that only the resources related to Deployment are read-only. We can still create and modify resources such as "Ingress" and "Service" using operational features. However, modifying the number of instances of Deployment using the "scaler" operational feature will not take effect.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: take-over
spec:
  components:
    - name: nginx-take-over
      type: k8s-objects
      properties:
        objects:
          - apiVersion: apps/v1
            kind: Deployment
            metadata:
              name: nginx
      traits:
        - type: scaler
          properties:
            replicas: 3
  policies:
    - type: take-over
      name: take-over
      properties:
        rules:
          - selector:
              resourceTypes: ["Deployment"]

In the "take-over" policy, we also include a series of selectors to ensure that the resources being taken over are controllable. In the example above, without the "take-over" policy, the operation would fail if there is already a Deployment resource named "nginx" in the system, because the resource already exists. On one hand, the take-over policy ensures that already existing resources can be included in management when creating applications; on the other hand, it also allows the reuse of previously existing workload configurations, and only modifications to the number of instances in operational features such as "scaler" will be "patched" to the original configuration as part of the new configuration.

Use command line to take over workloads

After learning about the take-over mode, you might wonder if there is an easy way to take over workloads with a single command. Yes, KubeVela's command line provides such a convenient way to take over workloads such as common K8s resources and "Helm". It is very easy to use. Specifically, the vela CLI automatically recognizes the resources in the system and assembles them into an application for take-over. Our core principle in designing this feature is that "taking over resources cannot trigger a restart of the underlying workloads".

As shown below, by default, using vela adopt will manage the resources in "read-only" mode. Simply specify the type, namespace, and name of the native resources you want to take over, and an Application object for takeover will be automatically generated. The generated application spec is strictly consistent with the actual fields in the cluster.

$ vela adopt deployment/default/example configmap/default/example
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  labels:
    app.oam.dev/adopt: native
  name: example
  namespace: default
spec:
  components:
  - name: example.Deployment.example
    properties:
      objects:
      - apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: example
          namespace: default
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: example
          template:
            metadata:
              labels:
                app: example
            spec:
              containers:
              - image: nginx
                imagePullPolicy: Always
                name: nginx
              restartPolicy: Always
          ...
    type: k8s-objects
  - name: example.config
    properties:
      objects:
      - apiVersion: v1
        kind: ConfigMap
        metadata:
          name: example
          namespace: default
    type: k8s-objects
  policies:
  - name: read-only
    properties:
      rules:
      - selector:
          componentNames:
          - example.Deployment.example
          - example.config
    type: read-only

The currently supported default takeover types and their corresponding resource API names are as follows:

crd: ["CustomResourceDefinition"]
ns: ["Namespace"]
workload: ["Deployment", "StatefulSet", "DaemonSet", "CloneSet"]
service: ["Service", "Ingress", "HTTPRoute"]
config: ["ConfigMap", "Secret"]
sa: ["ServiceAccount", "Role", "RoleBinding", "ClusterRole", "ClusterRoleBinding"]
operator: ["MutatingWebhookConfiguration", "ValidatingWebhookConfiguration", "APIService"]
storage: ["PersistentVolume", "PersistentVolumeClaim"]

If you want to change the application to takeover mode and deploy it directly to the cluster, just add a few parameters:

vela adopt deployment/default/example --mode take-over --apply

In addition to native resources, the vela command line also supports taking over workloads created by Helm applications by default.

vela adopt mysql --type helm --mode take-over --apply --recycle -n default

The above command will manage the "mysql" Helm release in the "default" namespace through "take-over" mode. Specifying --recycle will clean up the original Helm release metadata after a successful deployment.

Once the workloads have been taken over, the corresponding KubeVela Applications have been generated, and the related operations have been integrated with the KubeVela system. You can see the taken-over applications on the VelaUX interface, and also view and operate the applications through other vela command line functions.

You can also use a command to take over all the workloads in your namespace in batches. Based on KubeVela's resource topology capabilities, the system will automatically recognize the associated resources and form a complete application. For custom resources such as CRDs, KubeVela also supports custom rules for association relationships.

vela adopt --all --apply

This command will automatically recognize the resources and their association relationships in the current namespace based on the built-in resource topology rules, and take over the applications accordingly. Taking a Deployment as an example, the automatically taken over application looks like the following, which not only takes over the main workload Deployment but also its corresponding resources, including ConfigMap, Service, and Ingress.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: test2
  namespace: default
spec:
  components:
  - name: test2.Deployment.test2
    properties:
      objects:
      - apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: test2
          namespace: default
        spec:
          ...
    type: k8s-objects
  - name: test2.Service.test2
    properties:
      objects:
      - apiVersion: v1
        kind: Service
        metadata:
          name: test2
          namespace: default
        spec:
          ...
    type: k8s-objects
  - name: test2.Ingress.test2
    properties:
      objects:
      - apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          name: test2
          namespace: default
        spec:
          ...
    type: k8s-objects
  - name: test2.config
    properties:
      objects:
      - apiVersion: v1
        kind: ConfigMap
        metadata:
          name: record-event
          namespace: default
    type: k8s-objects
  policies:
  - name: read-only
    properties:
      rules:
      - selector:
          componentNames:
          - test2.Deployment.test2
          - test2.Service.test2
          - test2.Ingress.test2
          - test2.config
    type: read-only

The demonstration result is shown below:

If you want to use custom resource topology relationships to take over custom resources, you can use the following command:

vela adopt <your-crd> --all --resource-topology-rule=<your-rule-file.cue>

Flexible Definition of Takeover Rules

Given KubeVela's highly extensible design principles, the workloads and takeover methods faced by resource takeover are also different. Therefore, we have also designed a fully extensible and programmable way to take over workloads. In fact, the one-click takeover capability of the command line is also based on KubeVela's extensible takeover rules as a special case. The core idea is to define a configuration transformation rule through CUE, and then specify the transformation rule when executing the vela adopt command, as shown below.

vela adopt deployment/my-workload --adopt-template=my-adopt-rule.cue

This mode is only suitable for advanced users, and we will not go into too much detail here. If you want to learn more details, you can refer to the official documentation on workload takeover.

Significant Performance Optimization

Performance optimization is also a major highlight of this version. Based on the practical experience of different users in the community, we have improved the overall performance of the controller, the capacity of a single application, and the overall application processing throughput by 5 to 10 times without changing the default resource quotas. This also includes some changes to default configurations, with parameter trimming for some niche scenarios that affect performance.

In terms of single application capacity, because KubeVela applications may contain a large number of actual Kubernetes APIs, this often leads to the ResourceTracker behind the application that records the actual resource status and the ApplicationRevision object that records version information exceeding the 2MB limit of a single Kubernetes object. In version 1.7, we have added zstd compression functionality and enabled it by default, which directly compresses the size of resources by nearly 10 times. This also means that the resource capacity that a single KubeVela Application can support has increased by 10 times.

In addition, for some scenarios such as recording application and component versions, these version records themselves will increase proportionally with the number of applications, such as default recording of 10 application versions, which will increase by a factor of ten with the number of applications. Due to the list-watch mechanism of the controller itself, these additional objects will occupy controller memory, leading to a significant increase in memory usage. Many users (such as GitOps users) may have their own version management system. In order to avoid waste of memory, we have changed the default upper limit of application version records from 10 to 2. For component versions, which are relatively niche, we have disabled them by default. This reduces the controller's overall memory consumption to one-third of the original.

In addition, some parameter adjustments have been made, including reducing the number of historical versions recorded in the Definition from 20 to 2, and increasing the default Kubernetes API interaction limit QPS from 100 to 200, among others. In future versions, we will continue to optimize the performance of the controller.

Usability Improvement

In addition to the core feature updates and performance improvements, this release also includes enhancements to the usability of many features.

Client-side multi-environment resource rendering

"Dry run" is a popular concept in Kubernetes, which refers to the practice of running resources in a dry, non-destructive mode before they are actually applied to the cluster, to check if the configurations are valid. KubeVela also provides this feature, which not only checks if resources can be run, but also translates the abstract applications of OAM into the Kubernetes native API, allowing the client to convert from application abstraction to actual resources. The new feature added in version 1.7 is to specify different files for dry run, generating different actual resources.

For example, we can write different policy and workflow files for different environments, such as "test-policy.yaml" and "prod-policy.yaml". In this way, the same application can be specified with different policies and workflows in the client, generating different underlying resources, such as:

Running in test environment

vela dry-run  -f app.yaml -f test-policy.yaml -f test-workflow.yaml

Running in production environment

vela dry-run  -f app.yaml -f prod-policy.yaml -f prod-workflow.yaml

Here is the content of app.yaml, which specifies an external workflow to be referenced:

# app.yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: first-vela-app
spec:
  components:
    - name: express-server
      type: webservice
      properties:
        image: oamdev/hello-world
        ports:
         - port: 8000
           expose: true
      traits:
        - type: scaler
          properties:
            replicas: 1
  workflow:
    ref: deploy-demo

And the content of prod-policy.yaml and prod-workflow.yaml are as follows:

apiVersion: core.oam.dev/v1alpha1
kind: Policy
metadata:
  name: env-prod
type: topology
properties:
  clusters: ["local"]
  namespace: "prod"
---
apiVersion: core.oam.dev/v1alpha1
kind: Policy
metadata:
  name: ha
type: override
properties:
  components:
  - type: webservice
    traits:
    - type: scaler
      properties:
        replicas: 2

apiVersion: core.oam.dev/v1alpha1
kind: Workflow
metadata:
  name: deploy-demo
  namespace: default
steps:
  - type: deploy
    name: deploy-prod
    properties:
      policies: ["ha", "env-prod"]

The corresponding YAML files for the test environment can be modified in the same way by changing the parameters. This feature is particularly useful for users who use KubeVela as a client abstraction tool and combine it with tools such as Argo to synchronize resources.

Enhanced Application Deletion Feature

In many special scenarios, deleting applications has always been a painful experience. In version 1.7, we have added some convenient ways to support smooth application deletion for various special cases.

Deleting certain workloads in case of cluster disconnection: We provide an interactive method to delete resources, which allows users to select the underlying workloads by viewing the cluster name, namespace, and resource type, to remove resources involved in special scenarios such as cluster disconnection.

Retain underlying resources when deleting an application: If you only want to delete the metadata of the application while keeping the underlying workload and configuration, you can use the --orphan flag to retain the underlying resources when deleting the application.
Deleting application when controller is uninstalled: When you have uninstalled the KubeVela controller but found some remaining applications that were not cleaned up, you can use --force flag to delete these applications.

Custom output after addon installation

For the KubeVela plugin system, we have added a NOTES.cue file that allows plugin makers to dynamically output installation completion prompts. For example, for the Backstage plugin, the NOTES.cue file is as follows:

info: string
if !parameter.pluginOnly {
    info: """
        By default, the backstage app is strictly serving in the domain `127.0.0.1:7007`, check it by:
                    
            vela port-forward addon-backstage -n vela-system
        
        You can build your own backstage app if you want to use it in other domains. 
        """
}
if parameter.pluginOnly {
    info: "You can use the endpoint of 'backstage-plugin-vela' in your own backstage app by configuring the 'vela.host', refer to example https://github.com/wonderflow/vela-backstage-demo."
}
notes: (info)

This plugin's output will be displayed with different content based on the parameters used by the user during plugin installation.

Enhanced Workflow Capabilities

In version 1.7, we have enhanced the workflow capabilities with more granular options:

Support specifying a failed step for retry

vela workflow restart <app-name> --step=<step-name>

The step name of a workflow can be left blank, and it will be generated automatically by the webhook.
The parameter passing in workflows now supports overriding existing parameters.

In addition, we have added a series of new workflow steps in this version, including the typical built-push-image step that allows users to build an image and push it to a registry within the workflow. During the execution of the workflow steps, users can check the logs of a specific step using the vela workflow logs <name> --step <step-name> command. The full list of built-in workflow steps can be found in the official documentation.

Enhanced VelaUX Capabilities

The VelaUX console has also been enhanced in version 1.7, including:

Enhanced application workflow orchestration capabilities, supporting full workflow capabilities including sub-steps, input/output, timeouts, conditional statements, step dependencies, etc. Application workflow status viewing is also more comprehensive, with historical workflow records, step details, step logs, and input/output information available for query.
Support for application version regression, allowing users to view the differences between multiple versions of an application and select a specific version to roll back to.
Support for multi-tenancy, with stricter limits on multi-tenant permissions aligned with the Kubernetes RBAC model.

What's Next

Recently, the official version of KubeVela 1.8 is also in full swing and is expected to meet you at the end of March. We will further enhance the following aspects:

Enhance the scalability and stability of the KubeVela core controller, provide a sharding solution for controller horizontal scaling, optimize the controller performance and mapping for 10,000-level application scale in multi-cluster scenarios, and provide a new performance evaluation for the community. VelaUX supports out-of-the-box gray release capabilities, and interfaces with observability plugins for interactive release processes. At the same time, VelaUX forms an extensible framework system, providing configuration capabilities for customizing UI and supporting business custom extensions and interfaces. Enhance the GitOps workflow capabilities to support the complete VelaUX experience for applications synchronized with Git repositories. If you want to learn more about our plans, become a contributor, or partner with us, you can contact us through community communication (https://github.com/kubevela/community), and we look forward to your participation!

Taking Over Your Existing Workloads​

"read-only" and "take-over" policy​

Use command line to take over workloads​

Flexible Definition of Takeover Rules​

Significant Performance Optimization​

Usability Improvement​

Client-side multi-environment resource rendering​

Enhanced Application Deletion Feature​

Custom output after addon installation​

Enhanced Workflow Capabilities​

Enhanced VelaUX Capabilities​

What's Next​