Skip to main content

· 13 min read
Yike Wang

Background

Helm is an application packaging and deployment tool of client side widely used in the cloud-native field. Its simple design and easy-to-use features have been recognized by users and formed its ecosystem. Up to now, thousands applications have been packaged using Helm Chart. Helm's design concept is very concise and can be summarized in the following two aspects:

  1. Packaging and templating complex Kubernetes APIs and then abstracting and simplifying them into small number of parameters

  2. Giving Application Lifecycle Solutions: Production, upload (hosting), versioning, distribution (discovery), and deployment.

These two design principles ensure that Helm is flexible and simple enough to cover all Kubernetes APIs, which solves the problem of one-off cloud-native application delivery. However, for enterprises with a certain scale, using Helm for continuous software delivery poses quite a challenge.

Challenges of the Continuous Delivery of Helm

Helm was initially designed to ensure simplicity and ease of use instead of complex component orchestration. Therefore, Helm delivers all resources to Kubernetes clusters during the application deployment. It is expected to solve application dependency and orchestration problems automatically using Kubernetes' final-state oriented self-healing capabilities. Such a design may not be a problem during its first deployment, but it is too idealistic for the enterprise with a certain scale of production environment.

On the one hand, updating all resources when the application is upgraded may easily cause overall service interruption due to the short-term unavailability of some services. On the other hand, if there is a bug in the software, it cannot be rolled back in time, which may cause more trouble and make it difficult to control. In some serious scenarios, if some configurations in the production environment have been manually modified in O&M, the original modifications would be overwritten due to the one-off deployment of Helm. What's more, the previous versions of Helm may be inconsistent with the production environment, so it cannot be recovered using rollback. All of these add up to a larger area of failure.

Thus, with a certain scale, the grayscale and rollback capabilities of the software in the production environment are extremely important, and Helm itself cannot guarantee sufficient stability.

How Do We Enable Canary-Rollout for Helm?

Typically, a rigorous software upgrade process follows a similar process: It is roughly divided into three stages. The first stage upgrades small number of pods (such as 20% ) and switches a small amount of traffic to the new version. After completing this stage, the upgrade is paused first. After manual confirmation, continue the second phase, which is to upgrade a larger proportion of pods and traffic (such as 90% ), and pause again for manual confirmation. In the final stage, the full upgrade to the new version is completed, and the verification is completed, thus the entire rollout process is completed. If any exceptions, including business metrics, are found during the upgrade, such as an increase in the CPU or memory usage rate or excessive log requests error 500, you can roll back quickly.

image

The image above is a typical canary rollout scenario, so how do we complete the above process for the Helm chart application? There are two typical ways in the industry:

  1. Modify the Helm chart to change workloads into two copies and expose different Helm parameters. During the rollout, the images, number of pods, and traffic ratio of the two workloads are continuously modified to implement the canary rollout.
  2. Modify the Helm chart to change the original basic workload to a custom workload with the same features and with phased rollout capabilities and expose the Helm parameters. It's these canary rollout CRDs that are manipulated during the canary rollout.

The two solutions are complex with considerable modification costs, especially when your Helm chart is a third-party component that cannot be modified or cannot maintain a Helm chart. Even if the two are modified, there are still stability risks compared to the original simple workload model. The reason is that Helm is only a package management tool, and it is incompatible with the canary rollout or workloads management.

When we have in-depth communication with large number of users in the community, we find that most users' applications are not complicated, among which the most are classic types (such as Deployment and StatefulSet). Therefore, through the powerful addon mechanism of KubeVela and the OpenKruise community, we have made a canary rollout KubeVela Addon for these qualified types. This addon helps you easily complete the canary rollout of the Helm chart without any migration and modification. Also, if your Helm chart is more complicated, you can customize an addon for your scenario to get the same experience. Let's take you through a practical example (using Deployment Workload as an example) to get a glimpse of the complete process.

Use KubeVela for Canary Rollout

Prepare the Environment

  • Install KubeVela
$ curl -fsSl https://static.kubevela.net/script/install-velad.sh | bash
velad install

See this document for more installation details.

  • Enable related addon
$ vela addon enable fluxcd
$ vela addon enable ingress-nginx
$ vela addon enable kruise-rollout
$ vela addon enable velaux

In this step, the following addons are started:

  1. The fluxcd addon helps us enable the capability of Helm delivery.
  2. The ingress-nginx addon is used to provide traffic management capabilities of canary rollout.
  3. The kruise-rollout provides canary rollout capability.
  4. The velaux addon provides interface operation and visualization.
  • Map the Nginx ingress-controller port to local
$ vela port-forward addon-ingress-nginx -n vela-system

First Deployment

Run the following command to deploy the Helm application for the first time. In this step, the deployment is done through KubeVela's CLI tool. If you are familiar with Kubernetes, you can also deploy through kubectl apply. The two work the same.

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v1
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
version: "1.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

In the example above, we declare an application named canary-demo, which contains a Helm-type component (KubeVela also supports other types of component deployment), and the parameter of the component contains information (such as chart address and version).

In addition, we declare the kruise-rollout OAM trait for this component, which are the capabilities added to the component after the kruise-rollout addon is installed. The upgrade policy of Helm can be specified. In the first stage, 20% of pods and traffic are upgraded. After manual approve, 90% is upgraded. Finally, the full version is upgraded to the latest version.

Note: We have prepared a chart to demonstrate the intuitive effect (reflecting the version changes). The body of the Helm chart contains a Deployment and Ingress object, which is the most common scenario when the Helm chart is made. If your Helm chart is also equipped with the resources above, you can also use this example to canary rollout your helm chart.

After the deployment is successful, we use the following command to access the gateway address in your cluster, and you will see the following result:

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

In addition, through the resource topology graph of VelaUX, we can see that the five V1 pods are all ready.

image

Upgrade an Application

Apply the following yaml to upgrade your application:

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v2
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
# Upgade to version 2.0.0
version: "2.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

We noticed that the new application only has two changes compared to the first deployment:

  1. We upgrade the annotation of the app.oam.dev/publishVersion from V1 to V2. This shows this revision is a new version.

  2. We upgrade the version of the Helm chart to 2.0.0. The tag of the deployment image in this version of the chart is upgraded to V2.

After a while, we will find that the upgrade process stops at the first batch we defined above. Only 20% of pods and traffic are upgraded. At this time, if you execute the command above to access the gateway multiple times, you will find that Demo: V1 and Demo: V2 appear alternately, and there is a probability of almost 20% of getting the result of Demo: V2.

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V2

Looking at the topology status of the application resources again, you will see that the rollout CR created by the kruise-rollout trait has created a new version of the pod for us, while the five old version pods created by the previous workload remain unchanged.

image

Next, we execute the following command through KubeVela's CLI to resume the upgrade through manual review:

$ vela workflow resume canary-demo

After a while, we saw that five new versions of pods were created through the resource topology diagram. At this time, if we visit the gateway again, we will find that the probability of Demo: V2 has increased significantly, which is close to 90%.

Quick Rollback

Generally, in the rollout of a real-world scenario, after a manual review, it is found that the status of the new version of the application is abnormal. You need to terminate the current upgrade and quickly roll back the application to the version before the upgrade starts.

We can execute the following command to suspend the current publishing workflow first:

$ vela workflow suspend canary-demo
Rollout default/canary-demo in cluster suspended.
Successfully suspend workflow: canary-demo

Then, roll back to the version before the rollout, which is V1:

$ vela workflow rollback canary-demo
Application spec rollback successfully.
Application status rollback successfully.
Rollout default/canary-demo in cluster rollback.
Successfully rollback rolloutApplication outdated revision cleaned up.

At this time, when we access the gateway again, we will find that all the request results have returned to the V1 state:

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

At this time, we can see that all pods of the canary version have been deleted through the resource topology diagram, and from the beginning to the end, the five pods of V1, as the pods of the stable version, remain unchanged.

image

If you change the rollback operation above to resume the upgrade, the subsequent upgrade process will continue to complete the full rollout.

Please see this link for the complete operation procedure of the preceding demo.

Please refer to this link if you want to directly use native Kubernetes resources to implement the process above.

In addition to Deployment, the kruise-rollout addon supports StatefulSet and OpenKruise CloneSet. If the workload types in your chart are one of the three types mentioned above, canary rollout can be implemented through the example above.

I believe you can also notice that the example above is a nginx-Ingress-controller-based seven-layer traffic splitting scheme. In addition, we support Kubernetes Gateway's API to support more gateway types and four-layer traffic splitting schemes.

How Can the Stability of the Rollout Be Guaranteed?

After the first deployment, the kruise rollout addon (hereinafter referred to as rollout) listens to the resources deployed by the Helm chart, which are deployment, service, and ingress in the example. StatefulSet and OpenKruise Cloneset are also supported. The rollout will take over the subsequent upgrade actions of this deployment.

During the upgrade, the new version of Helm takes effect first, and the deployment image is updated to V2. However, the upgrade process of the deployment will be taken over by the rollout process from the controller-manager, so the pods under the deployment will not be upgraded. At the same time, the rollout will copy a canary version of deployment, with V2 as the tag of the image, and create a service to filter the pods below it, together with an ingress pointing to this service. Finally, this ingress will receive the traffic of the canary version by setting the annotation corresponding to the ingress, thus enabling traffic splitting. For details, Please see this link.

After all manual confirmation steps are passed, and the full rollout is completed, the rollout will return the upgrade control of deployment of the stable version to the controller-manager. At that time, the stable version of pods will be upgraded to the new version one after another. When the stable version pods are all ready, the canary version of deployment, service, and ingress will be destroyed one after another, thus ensuring that the request traffic will not hit the unready pod during the whole process nor cause abnormal requests.

After that, we will continue to iterate in the following areas to support more scenarios and bring a more stable and reliable upgrade experience:

  1. The upgrade process is connected to KubeVela's workflow system, thus introducing a richer system of intermediate steps to expand the system and supporting functions (such as sending notifications through the workflow during the upgrade process). Even in the pause phase of each step, it can connect to the external observability system and automatically decide whether to continue publishing or roll back by checking indicators (such as logs or monitoring) to implement unattended publishing policies.
  2. It integrates with more addons (such as istio) to support the traffic splitting solution of Service Mesh.
  3. In addition to the percentage-based traffic splitting method, header-or cookie-based traffic splitting rules and features (such as blue-green publishing) are supported.

Summary

As mentioned earlier, Helm's canary rollout process is implemented by KubeVela through the addon system. fluxcd addon helps us deploy and manage the lifecycle of the Helm chart. The kruise-rollout addon helps us upgrade the workload pod and switch traffic during the upgrade process. By combining two addons, the full lifecycle management and canary upgrade of Helm applications are realized without any changes to your Helm chart. You can also write addon for your scenarios to complete more special scenes or processes.

Based on KubeVela's powerful scalability, you can flexibly combine these addons and dynamically replace the underlying capabilities according to different platforms or environments without any changes to the upper-level applications. For example, if you prefer to use argocd instead of fluxcd to implement the deployment of Helm applications, you can implement the same function by enabling the addon of argocd without any changes or migration at the upper-layer of Helm applications.

Now, the KubeVela community has provided dozens of addons, which can help the platform expand its capabilities in observability, GitOps, FinOps, rollout, etc.

image

You can find Addon's warehouse address here. If you are interested in addons, you are welcome to submit your custom addon and contribute new ecological capabilities to the community!

· 4 min read

You may have learned from this blog that we can use vela to manage cloud resources (like s3 bucket, AWS EIP and so on) via the terraform plugin. We can create an application which contains some cloud resource components and this application will generate these cloud resources, then we can use vela to manage them.

Sometimes we already have some Terraform cloud resources which may be created and managed by the Terraform binary or something else. In order to have the benefits of using KubeVela to manage the cloud resources or just maintain consistency in the way you manage cloud resources, we may want to import these existing Terraform cloud resources into KubeVela and use vela to manage them. But if we just create an application which describes these cloud resources, the cloud resources will be recreated and may lead to errors. To fix this problem, we made a simple backup_restore tool. This blog will show you how to use the backup_restore tool to import your existing Terraform cloud resources into KubeVela.

Step 1: Create Terraform Cloud Resources

Since we are going to demonstrate how to import an existing cloud resource into KubeVela, we need to create one first. If you already have such resources, you can skip this step.

Before start, make sure you have:

Let's get started!

  1. Create an empty directory to start.

    mkdir -p cloud-resources
    cd cloud-resources
  2. Create a file named main.tf which will create a S3 bucket:

    resource "aws_s3_bucket" "bucket-acl" {
    bucket = var.bucket
    acl = var.acl
    }

    output "RESOURCE_IDENTIFIER" {
    description = "The identifier of the resource"
    value = aws_s3_bucket.bucket-acl.bucket_domain_name
    }

    output "BUCKET_NAME" {
    value = aws_s3_bucket.bucket-acl.bucket_domain_name
    description = "The name of the S3 bucket"
    }

    variable "bucket" {
    description = "S3 bucket name"
    default = "vela-website"
    type = string
    }

    variable "acl" {
    description = "S3 bucket ACL"
    default = "private"
    type = string
    }
  3. Configure the AWS Cloud provider credentials:

    export AWS_ACCESS_KEY_ID="your-accesskey-id"
    export AWS_SECRET_ACCESS_KEY="your-accesskey-secret"
    export AWS_DEFAULT_REGION="your-region-id"
  4. Set the variables in the main.tf file:

    export TF_VAR_acl="private"; export TF_VAR_bucket="your-bucket-name"
  5. (Optional) Create a backend.tf to configure your Terraform backend. We just use the default local backend in this example.

  6. Run terraform init and terraform apply to create the S3 bucket:

    terraform init && terraform apply
  7. Check the S3 bucket list to make sure the bucket is created successfully.

  8. Run terraform state pull to get the Terraform state of the cloud resource and store it into a local file:

    terraform state pull > state.json

Step 2: Import Existing Terraform Cloud Resources into KubeVela

  1. Create the application.yaml file, please ensure that the description of each field of Component is consistent with your cloud resource configuration:

    apiVersion: core.oam.dev/v1beta1
    kind: Application
    metadata:
    name: app-aws-s3
    spec:
    components:
    - name: sample-s3
    type: aws-s3
    properties:
    bucket: vela-website-202110191745
    acl: private
    writeConnectionSecretToRef:
    name: s3-conn
  2. Get the backup_restore tool:

    git clone https://github.com/kubevela/terraform-controller.git
    cd terraform-controller/hack/tool/backup_restore
  3. Run the restore command:

    go run main.go restore --application <path/to/your/application.yaml> --component sample-s3 --state <path/to/your/state.json>

    The above command will resume the Terraform backend in the Kubernetes first and then create the application without recreating the S3 bucket.

That's all! You have successfully migrate the management of the S3 bucket to KubeVela!

What's more

For more information about the backup_restore tool, please read the doc. If you have any problem, issues and pull requests are always welcome.

· 8 min read
Jianbo Sun

Helm Charts are very popular that you can find almost 10K different software packaged in this way. While in today's multi-cluster/hybrid cloud business environment, we often encounter these typical requirements: distribute to multiple specific clusters, specific group distributions according to business need, and differentiated configurations for multi-clusters.

In this blog, we'll introduce how to use KubeVela to do multi cluster delivery for Helm Charts.

If you don't have multi clusters, don't worry, we'll introduce from scratch with only Docker or Linux System required. You can also refer to the basic helm chart delivery in single cluster.

· 8 min read
Jianbo Sun

If you're looking for something to glue Terraform ecosystem with the Kubernetes world, congratulations! You're getting exactly what you want in this blog.

We will introduce how to integrate terraform modules into KubeVela by fixing a real world problem -- "Fixing the Developer Experience of Kubernetes Port Forwarding" inspired by article from Alex Ellis.

In general, this article will be divided into two parts:

  • Part.1 will introduce how to glue Terraform with KubeVela, it needs some basic knowledge of both Terraform and KubeVela. You can just skip this part if you don't want to extend KubeVela as a Developer.
  • Part.2 will introduce how KubeVela can 1) provision a Cloud ECS instance by KubeVela with public IP; 2) Use the ECS instance as a tunnel sever to provide public access for any container service within an intranet environment.

OK, let's go!

· 13 min read

KubeVela is a modern software delivery control panel. The goal is to make application deployment and O&M simpler, more agile, and more reliable in today's hybrid multi-cloud environment. Since the release of Version 1.1, the KubeVela architecture has naturally solved the delivery problems of enterprises in the hybrid multi-cloud environments and has provided sufficient scalability based on the OAM model, which makes it win the favor of many enterprise developers. This also accelerates the iteration of KubeVela.

In Version 1.2, we released an out-of-the-box visual console, which allows the end user to publish and manage diverse workloads through the interface. The release of Version 1.3 improved the expansion system with the OAM model as the core and provides rich plug-in functions. It also provides users with a large number of enterprise-level functions, including LDAP permission authentication, and provides more convenience for enterprise integration. You can obtain more than 30 addons in the addons registry of the KubeVela community. There are well-known CNCF projects (such as argocd, istio, and traefik), database middleware (such as Flink and MySQL), and hundreds of cloud vendor resources.

In Version 1.4, we focused on making application delivery safe, foolproof, and transparent. We added core functions, including multi-cluster permission authentication and authorization, a complex resource topology display, and a one-click installation control panel. We comprehensively strengthened the delivery security in multi-tenancy scenarios, improved the consistent experience of application development and delivery, and made the application delivery process more transparent.

· 8 min read

One of the biggest requests from KubeVela community is to provide a transparent delivery process for resources in the application. For example, many users prefer to use Helm Chart to package a lot of complex YAML, but once there is any issue during the deployment, such as the underlying storage can not be provided normally, the associated resources are not created normally, or the underlying configuration is incorrect, etc., even a small problem will be a huge threshold for troubleshooting due to the black box of Helm chart. Especially in the modern hybrid multi-cluster environment, there is a wide range of resources, how to obtain effective information and solve the problem? This can be a very big challenge.

resource graph

As shown in the figure above, KubeVela has offered a real-time observation resource topology graph for applications, which further improves KubeVela's application-centric delivery experience. Developers only need to care about simple and consistent APIs when initiating application delivery. When they need to troubleshoot problems or pay attention to the delivery process, they can use the resource topology graph to quickly obtain the arrangement relationship of resources in different clusters, from the application to the running status of the Pod instance. Automatically obtain resource relationships, including complex and black-box Helm Charts.

In this post, we will describe how this new feature of KubeVela is implemented and works, and the roadmap for this feature.

· 10 min read
Jianbo Sun

This blog will introduce how to use CUE and KubeVela to build you own abstraction API to reduce the complexity of Kubernetes resources. As a platform builder, you can dynamically customzie the abstraction, build a path from shallow to deep for your developers per needs, adapt to growing number of different scenarios, and meet the iterative demands of the company's long-term business development.

· 16 min read
KubeVela Community

Thanks to the contribution of hundreds of developers from KubeVela community and around 500 PRs from more than 30 contributors, KubeVela version 1.3 is officially released. Compared to v1.2 released three months ago, this version provides a large number of new features in three aspects as OAM engine (Vela Core), GUI dashboard (VelaUX) and addon ecosystem. These new features are derived from the in-depth practice of many end users such as Alibaba, LINE, China Merchants Bank, and iQiyi, and then finally become part of the KubeVela project that everyone can use out of the box.

Pain Points of Application Delivery

So, what challenges have we encountered in cloud-native application delivery?

· 7 min read
Wei Duan

Under today's multi-cluster business scene, we often encounter these typical requirements: distribute to multiple specific clusters, specific group distributions according to business need, and differentiated configurations for multi-clusters.

KubeVela v1.3 iterates based on the previous multi-cluster function. This article will reveal how to use it to do swift multiple clustered deployment and management to address all your anxieties.

· 6 min read
Xiangbo Ma

The cloud platform development team of China Merchants Bank has been trying out KubeVela since 2021 internally and aims to using it for enhancing our primary application delivery and management capabilities. Due to the specific security concern for financial insurance industry, network control measurements are relatively strict, and our intranet cannot directly pull Docker Hub image, and there is no Helm image source available as well. Therefore, in order to landing KubeVela in the intranet, you must perform a complete offline installation.

This article will take the KubeVela V1.2.5 version as an example, introduce the offline installation practice to help other users easier to complete KubeVela's deployment in offline environment.