Skip to main content

· 阅读需要 1 分钟
董天欣

Serverless 应用引擎(SAE) 是一款底层基于 Kubernetes,实现了 Serverless 架构与微服务架构结合的云产品。作为一款不断迭代的云产品,在快速发展的过程中也遇到了许多挑战。如何在蓬勃发展的云原生时代中解决这些挑战,并进行可靠快速的云架构升级?SAE 团队和 KubeVela 社区针对这些挑战开展了紧密合作,并给出了云原生下的开源可复制解决方案——KubeVela Workflow。

本文将详细介绍 SAE 使用 KubeVela Workflow 进行架构升级的解决方案,并对多个实践场景进行一一解读。

Serverless 时代下的挑战

Serverless 应用引擎(SAE)是面向业务应用架构、微服务架构的一站式应用托管平台,是一款底层基于 Kubernetes,实现了 Serverless 架构与微服务架构结合的云产品。

image.png

如上架构图,SAE 的用户可以将多种不同类型的业务应用托管在 SAE 之上。而在 SAE 底层,则会通过 JAVA 业务层处理相关的业务逻辑,以及与 Kubernetes 资源进行交互。在最底层,则依靠高可用,免运维,按需付费的弹性资源池。 在这个架构下,SAE 主要依托其 JAVA 业务层为用户提供功能。这样的架构在帮助用户一键式部署应用的同时,也带来了不少挑战。

在 Serverless 持续发展的当下,SAE 主要遇到了三大挑战:

  1. SAE 内部的工程师在开发运维的过程中,存在着一些复杂且非标准化的运维流程。如何自动化这些复杂的操作,从而降低人力的消耗?
  2. 随着业务发展,SAE 的应用发布功能受到了大量用户的青睐。用户增长的同时也带来了效率的挑战,在面对大量用户高并发的场景下,如何优化已有的发布功能并提升效率?
  3. 在 Serverless 持续落地于企业的当下,各大厂商都在不断将产品体系 Serverless 化。在这样的浪潮下,SAE 应该如何在快速对接内部 Serverless 能力,在上线新功能的同时,降低开发成本?

纵观上述三个挑战,不难看出,SAE 需要某种编排引擎来升级发布功能,对接内部能力以及自动化运维操作。

而这个编排引擎需要满足以下条件来解决这些挑战:

  1. 高可扩展。对于这个编排引擎来说,流程中的节点需要具备高可扩展性,只有这样,才能将原本非标准化且复杂的操作节点化,从而和编排引擎的流程控制能力结合在一起,发挥出 1+1 > 2 的效果,从而降低人力的消耗。
  2. 轻量高效。这种编排引擎必须高效,且生产可用。这样才能满足 SAE 在大规模用户场景下的高并发需求。
  3. 强对接和流程控制能力。这个编排引擎需要能够快速业务的原子功能,把原本串联上下游能力的胶水代码转换成编排引擎中的流程,从而降低开发成本。

基于上面这些挑战和思考,SAE 和 KubeVela 社区进行了深度合作,并推出了 KubeVela Workflow 这个项目作为编排引擎。

为什么要用 KubeVela Workflow?

得益于云原生蓬勃的生态发展,社区中已经有许多成熟的工作流项目,如 Tekton,Argo 等。在阿里云内部,也有一些编排引擎的沉淀。那么为什么要“新造一个轮子”,而不使用已有的技术呢?

因为 KubeVela Workflow 在设计上有一个非常根本的区别:工作流中的步骤面向云原生 IaC 体系设计,支持抽象封装和复用,相当于你可以直接在步骤中调用自定义函数级别的原子能力,而不仅仅是下发容器。

image.png

在 KubeVela Workflow 中,每个步骤都有一个步骤类型,而每一种步骤类型,都会对应 WorkflowStepDefinition(工作流步骤定义)这个资源。你可以使用 CUE 语言(一种 IaC 语言,是 JSON 的超集) 来编写这个步骤定义,或者直接使用社区中已经定义好的步骤类型。

你可以简单地将步骤类型定义理解为一个函数声明,每定义一个新的步骤类型,就是在定义一个新的功能函数。函数需要一些输入参数,步骤定义也是一样的。在步骤定义中,你可以通过 parameter 字段声明这个步骤定义需要的输入参数和类型。当工作流开始运行时,工作流控制器会使用用户传入的实际参数值,执行对应步骤定义中的 CUE 代码,就如同执行你的功能函数一样。

有了这样一层步骤的抽象,就为步骤增添了极大的可能性。

  • 如果你希望自定义步骤类型,就如同编写一个新的功能函数一样,你可以在步骤定义中直接通过 import来引用官方代码包,从而将其他原子能力沉淀到步骤中,包括 HTTP 调用,在多集群中下发,删除,列出资源,条件等待,等等。这也意味着,通过这样一种可编程的步骤类型,你可以轻松对接任意系统。如,在 SAE 的场景下,在步骤定义中解决和内部其他原子能力(如 MSE,ACR,ALB,SLS 等等)的对接,再使用工作流的编排能力来控制流程:

image.png

  • 如果你只希望使用定义好的步骤,那么,就如同调用一个封装好的第三方功能函数一样,你只需要关心你的输入参数,并且使用对应的步骤类型就可以了。如,一个典型的构建镜像场景。首先,指定步骤类型为 build-push-image,接着,指定你的输入参数:构建镜像的代码来源与分支,构建后镜像名称以及推送到镜像仓库需要使用的秘钥信息。
apiVersion: core.oam.dev/v1alpha1
kind: WorkflowRun
metadata:
name: build-push-image
namespace: default
spec:
workflowSpec:
steps:
- name: build-push
type: build-push-image
properties:
context:
git: github.com/FogDong/simple-web-demo
branch: main
image: fogdong/simple-web-demo:v1
credentials:
image:
name: image-secret

在这样一种架构下,步骤的抽象给步骤本身带来了无限可能性。当你需要在流程中新增一个节点时,你不再需要将业务代码进行“编译-构建-打包”后用 Pod 来执行逻辑,只需要修改步骤定义中的配置代码,再加上工作流引擎本身的编排控制能力,就能够完成新功能的对接。

而这也是 SAE 中使用 KubeVela Workflow 的基石,在可扩展的基础之上,才能充分借助和发挥生态的力量,加速产品的升级。

使用案例

接下来,我们来深入 SAE 中使用 KubeVela Workflow 的场景,从案例中进行更深层的理解。

案例 1: 自动化运维操作

第一个场景,是 SAE 内部的运维工程师的一个自动化运维场景。

在 SAE 内部有这样一个场景,我们会编写并更新一些基础镜像,并且需要将这些镜像预热到多个不同 Region 的集群中,通过镜像预热,来给使用这些基础镜像的用户更好的体验。

原本的操作流程非常复杂,不仅涉及到了使用 ACR 的镜像构建,多个区域的镜像推送,还需要制作镜像缓存的模板,并且将这些镜像缓存推送到不同 Region 的集群当中。这里的 Region 包括上海,美西,深圳,新加坡等等。这些操作是非标准化,且非常耗时的。因为当一个运维需要在本地推送这些镜像到国外的集群当中时,很有可能因为网络带宽的问题而失败。因此,他需要将精力分散到这些原本可以自动化的操作上。

而这也是一个非常适合 KubeVela Workflow 的场景:这些操作里面的每一个步骤,都可以通过可编程的方式转换成工作流里的步骤,从而可以在编排这些步骤的同时,达到可复用的效果。同时,KubeVela Workflow 也提供了一个可视化的 Dashboard,运维人员只需要配置一次流水线模板,之后就可以通过触发执行,或者在每次触发执行时传入特定的运行时参数,来完成流程的自动化。

在这里,简化版的步骤如下:

  1. 使用 HTTP 请求步骤类型,通过请求 ACR 的服务来构建镜像,并通过参数传递将镜像 ID 传递给下一个步骤。在该步骤定义中,需要等待 ACR 的服务构建完毕后,才结束当前步骤的执行。
  2. 如果第一步构建失败了,则进行该镜像的错误处理。
  3. 如果第一步构建成功了,则使用 HTTP 请求步骤来调用镜像缓存构建的服务,并同时将服务的日志作为当前的步骤来源。这里可以直接在 UI 中查看步骤的日志以排查问题,
  4. 使用一个步骤组,在里面分别使用下发资源类型的步骤来进行上海集群和美西集群的镜像预热:这里会使用 KubeVela Workflow 的多集群管控能力,直接在多集群中下发 ImagePullJob 工作负载,进行镜像预热。

image.png

上面这个流程中,如果不使用 KubeVela Workflow,你可能需要写一段业务代码,来串联多个服务和集群。以最后一步,往多集群中下发 ImagePullJob 工作负载为例:你不仅需要管理多集群的配置,还需要 Watch 工作负载(CRD)的状态,直到工作负载的状态变成 Ready,才继续下一步。而这个流程其实对应了一个简单的 Kubernetes Operator 的 Reconcile 逻辑:先是创建或者更新一个资源,如果这个资源的状态符合了预期,则结束此次 Reconcile,如果不符合,则继续等待。

难道我们运维操作中每新增一种资源的管理,就需要实现一个 Operator 吗?有没有什么轻便的方法,可以将我们从复杂的 Operator 开发中解放出来呢?

正是因为 KubeVela Workflow 中步骤的可编程性,能够完全覆盖 SAE 场景中的这些运维操作和资源管理,才能够帮助工程师们降低人力的消耗。类似上面的逻辑,对应到 KubeVela Workflow 的步骤定义中则非常简单,不管是什么类似的资源(或者是一个 HTTP 接口的请求),都可以用类似的步骤模板覆盖:

template: {
// 第一步:从指定集群中读取资源
read: op.#Read & {
value: {
apiVersion: parameter.apiVersion
kind: parameter.kind
metadata: {
name: parameter.name
namespace: parameter.namespace
}
}
cluster: parameter.cluster
}
// 第二步:直到资源状态 Ready,才结束等待,否则步骤会一直等待
wait: op.#ConditionalWait & {
continue: read.value.status != _|_ && read.value.status.phase == "Ready"
}
// 第三步(可选):如果资源 Ready 了,那么...
// 其他逻辑...

// 定义好的参数,用户在使用该步骤类型时需要传入
parameter: {
apiVersion: string
kind: string
name: string
namespace: *context.namespace | string
cluster: *"" | string
}
}

对应到当前这个场景就是:

  1. 第一步:读取指定集群(如:上海集群)中的 ImagePullJob 状态。
  2. 第二步:如果 ImagePullJob Ready,镜像已经预热完毕,则当前步骤成功,执行下一个步骤。
  3. 第三步:当 ImagePullJob Ready 后,清理集群中的 ImagePullJob。

通过这样自定义的方式,不过后续在运维场景下新增了多少 Region 的集群或是新类型的资源,都可以先将集群的 KubeConfig 纳管到 KubeVela Workflow 的管控中后,再使用已经定义好的步骤类型,通过传入不同的集群名或者资源类型,来达到一个简便版的 Kubernetes Operator Reconcile 的过程,从而极大地降低开发成本。

案例 2:优化已有的发布流程

在自动化内部的运维操作之外,升级原本 SAE 的产品架构,从而提升产品的价值和用户的发布效率,也是 SAE 选择 KubeVela Workflow 的重要原因。

原有架构

在 SAE 的发布场景中,一次发布会对应一系列的任务,如:初始化环境,构建镜像,分批发布等等。这一系列任务对应下图中的 SAE Tasks。

在 SAE 原本的架构中,这些任务会被有序地扔给 JAVA Executor 来进行实际的业务逻辑,比如,往 Kubernetes 中下发资源,以及与 MySQL 数据库进行当前任务的状态同步,等等。

在当前的任务完成后,JAVA Executor 会从 SAE 原本的编排引擎中获取下一个任务,同时,这个编排引擎也会不断地将新任务放到最开始的任务列表中。

image.png

而这个老架构中最大的问题就在于轮询调用, JAVA Executor 会每隔一秒从 SAE 的任务列表中进行获取,查看是否有新任务;同时,JAVA Executor 下发了 Kubernetes 资源后,也会每隔一秒尝试从集群中获取资源的状态。 SAE 原有的业务架构并不是 Kubernetes 生态中的控制器模式,而是轮询的业务模式,如果将编排引擎层升级为事件监听的控制器模式,就能更好地对接整个 Kubernetes 生态,同时提升效率。

image.png

但是在原本的架构中,业务的逻辑耦合较深。如果使用的是传统的以容器为基础下发的云原生工作流的话,SAE 就需要将原本的业务逻辑打包成镜像,维护并更新一大堆镜像制品,这并不是一条可持续发展的道路。我们希望升级后的工作流引擎,能够轻量地对接 SAE 的任务编排,业务执行层以及 Kubernetes 集群。

新架构

image.png

在借助了 KubeVela Workflow 的高可扩展性后,SAE 的工程师既不需要将原有的能力重新打包成镜像,也不需要进行大规模的修改。

新的流程如上图:SAE 产品侧创建了发布单之后,业务侧会将模型写入数据库,进行模型转换,生成一条 KubeVela Workflow,这里对应到右侧的 YAML。

同时,SAE 原本的 JAVA Executor 将原本的业务能力提供成微服务 API。KubeVela Workflow 在执行时,每个步骤都是 IaC 化的,底层实现是 CUE 语言。这些步骤有的会去调用 SAE 的业务微服务 API,有的则会直接与底层 Kubernetes 资源进行交付。而步骤间也可以进行数据传递。如果调用出错了,可以通过步骤的条件判断,来进行错误处理。

这样一种优化,不仅可扩展,充分复用 K8S 生态,工作流流程和原子能力均可扩展,并且面向终态。这种可扩展和流程控制的相结合,能够覆盖原本的业务功能,并且减少开发量。同时,状态的更新从分支级延迟降低到毫秒级,敏捷且原生,不仅具备了 YAML 级的描述能力,还能将开发效率从 1d 提升到 1h。

案例 3:快速上线新功能

除了自动化运维和升级原本的架构,KubeVela Workflow 还能提供什么?

步骤的可复用性和与其他生态的易对接性,在升级之外,给 SAE 带来了额外的惊喜:从编写业务代码变为编排不同步骤,从而快速上线产品的新功能!

SAE 沉淀了大量的 JAVA 基础,并且支持了丰富的功能,如:支持 JAVA 微服务的单批发布,分批发布,以及金丝雀发布等等。但随着客户增多,一些客户提出了新的需求,他们希望能有多语言南北流量的灰度发布能力。

在云原生蓬勃的生态下,灰度发布这块也有许多成熟的开源产品,比如 Kruise Rollout。SAE 的工程师调研后发现,可以使用 Kruise Rollout 来完成灰度发布的能力,同时,配合上阿里云的内部的 ingress controller,ALB,来进行不同流量的切分。

image.png

这样一种方案就沉淀成了上面这张架构图,SAE 下发一条 KubeVela Workflow,Workflow 中的步骤将同时对接阿里云的 ALB,开源的 Kruise Rollout 以及 SAE 自己的业务组件,并在步骤中完成对批次的管理,从而完成功能的快速上线。

事实上,使用了 KubeVela Workflow 后,上线这个功能就不再需要编写新的业务代码,只需要编写一个更新灰度批次的步骤类型。

由于步骤类型的可编程性,我们可以轻松在定义中使用不同的更新策略来更新 Rollout 对象的发布批次以及对应下发集群。并且,在工作流中,每个步骤的步骤类型都是可复用的。这也意味着,当你开发新步骤类型时,你也在为下一次,下下次的新功能上线打下了基础。这种复用,可以让你迅速地沉淀功能,极大减少了开发成本。

有了这种高效的编排能力,我们就能进行快速变更,在通用变更的基础上,如果某客户需要开启功能,可以迅速进行自定义变更。

总结

在 SAE 进行了 KubeVela 架构升级后,不仅提升了发布效率,同时,也降低了开发成本。可以在底层依赖 Serverless 基础设施的优势之上,充分发挥产品在应用托管上的优势。

并且,KubeVela Workflow 在 SAE 中的架构升级方案,也是一个可复制的开源解决方案。社区已经内置提供了 50+ 的步骤类型,包括构建、推送镜像,镜像扫描,多集群部署,使用 Terraform 管理基础设施,条件等待,消息通知等,能够帮你轻松打通 CICD 的全链路。

image.png

你还可以参考以下文档来获取更多使用场景:

· 阅读需要 1 分钟
乔中沛

背景

随着万物互联场景的逐渐普及,边缘设备的算力也不断增强,如何借助云计算的优势满足复杂多样化的边缘应用场景,让云原生技术延伸到端和边缘成为了新的技术挑战,“云边协同”正在逐渐成为新的技术焦点。本文将围绕 CNCF 的两大开源项目 KubeVela 和 OpenYurt,以一个实际的 Helm 应用交付的场景,为大家介绍云边协同的解决方案。 OpenYurt 专注于以无侵入的方式将 Kubernetes 扩展到边缘计算领域。OpenYurt 依托原生 Kubernetes 的容器编排、调度能力,将边缘算力纳入到 Kubernetes 基础设施中统一管理,提供了诸如边缘自治、高效运维通道、边缘单元化管理、边缘流量拓扑、安全容器、边缘 Serverless/FaaS、异构资源支持等能力。以 Kubernetes 原生的方式为云边协同构建了统一的基础设施。 KubeVela 孵化于 OAM 模型,专注于帮助企业构建统一的应用交付和管理能力,为业务开发者屏蔽底层基础设施复杂度,提供灵活的扩展能力,并提供开箱即用的微服务容器管理、云资源管理、版本化和灰度发布、扩缩容、可观测性、资源依赖编排和数据传递、多集群、CI 对接、GitOps 等特性。最大化的提升开发者自助式应用管理的研发效能,提升也满足平台长期演进的扩展性诉求。

OpenYurt 与 KubeVela 结合能解决什么问题?

如上所述,OpenYurt 满足了边缘节点的接入,让用户可以通过操作原生 Kubernetes 的方式管理边缘节点。边缘节点通常用来表示距离用户更近的计算资源,比如某个就近机房中的虚拟机或物理服务器等,通过 OpenYurt 加入后,这些边缘节点会转化为 Kubernetes 中可以使用的节点(Node)。OpenYurt 用节点池(NodePool)来描述同一地域的一组边缘节点。在满足了基本的资源管理后,针对应用如何编排部署到一个集群中的不同节点池,我们通常会有如下核心需求:

  1. 统一配置: 如果对每一份要下发的资源做手动修改,需要很多人工介入,非常容易出错和遗漏。我们需要统一的方式来做参数配置,不仅可以方便地做批量管理操作,还可以对接安全和审计,满足企业风险控制与合规需求。
  2. 差异部署: 部署到不同节点池的工作负载有大部分属性相同,然而总有个性化的配置差异。关键在于如何设置和节点选择相关的参数,例如NodeSelector可以指示 Kubernetes 调度器将工作负载调度到不同的节点池。
  3. 可扩展性: 云原生生态繁荣,无论是工作负载种类还是不同的运维功能都在不断增长,为了更灵活地满足业务需求,我们需要整体的应用架构是可扩展的,能够充分享受云原生生态的红利。

而 KubeVela 在应用层与 OpenYurt 可以很好的互补,满足上述三个核心需求。接下来,我们结合实际的操作流程来展示这些功能点。

将应用部署到边缘

我们将以 Ingress 控制器为例,展示如何使用 Kubevela 将应用程序部署到边缘。在这种情况下,我们希望将Nginx Ingress 控制器部署到多个节点池中,以实现通过边缘 Ingress 访问指定节点池提供的服务,某个 Ingress 仅能由所在节点池的 Ingress 控制器处理。 示意图的集群中有两个节点池:北京和上海,他们之间的网络不互通。我们希望再其中每个节点池都部署一个Nginx Ingress Controller,并作为各自节点池的网络流量入口。一个靠近北京的客户端,可以通过访问北京节点池的Ingress Controller,访问北京节点池内提供的服务,且不会访问到上海节点池提供的服务。

image.png

Demo 的基础环境

我们将使用 Kubernetes 集群模拟边缘场景。群集有 3 个节点,它们的角色分别是:

  • 节点 1:master节点,云端节点
  • 节点 2:worker节点,边缘节点,在节点池 beijing
  • 节点 3:worker节点,边缘节点,在节点池 shanghai

准备工作

  1. 安装 YurtAppManager

    YurtAppManager 是 OpenYurt 的核心组件。它提供节点池 CRD 和控制器。OpenYurt 中还有其他组件,但在本教程中我们只需要 YurtAppManager.

git clone https://github.com/openyurtio/yurt-app-manager
cd yurt-app-manager && helm install yurt-app-manager -n kube-system ./charts/yurt-app-manager/
  1. 安装 Kubevela,启用 FluxCD 插件。

安装 Vela 命令行工具,并在集群中安装 KubeVela。

curl -fsSl https://kubevela.net/script/install.sh | bash
vela install

在本案例中,为了复用社区提供的成熟的 Helm Chart,我们用 Helm 类型的组件来安装 Nginx Ingress Controller。在微内核设计的 KubeVela 中,Helm 类型的组件是由 FluxCD 插件提供的,下面启用 FluxCD 插件

vela addon enable fluxcd
  1. 准备节点池

创建两个节点池:北京节点池和上海节点池。在实际的边缘场景中,以地域划分节点池是常见的模式。不同分组的节点间往往存在网络不互通、资源不共享、资源异构、应用独立等明显的隔离属性。这也是节点池概念的由来。在OpenYurt中,通过节点池、服务拓扑等功能帮助用户处理上述问题。今天的例子中我们将使用节点池来描述和管理节点。

kubectl apply -f - <<EOF
apiVersion: apps.openyurt.io/v1beta1
kind: NodePool
metadata:
name: beijing
spec:
type: Edge
annotations:
apps.openyurt.io/example: test-beijing
taints:
- key: apps.openyurt.io/example
value: beijing
effect: NoSchedule
---
apiVersion: apps.openyurt.io/v1beta1
kind: NodePool
metadata:
name: shanghai
spec:
type: Edge
annotations:
apps.openyurt.io/example: test-shanghai
taints:
- key: apps.openyurt.io/example
value: shanghai
effect: NoSchedule
EOF

将节点添加到各自的节点池

kubectl label node <node1> apps.openyurt.io/desired-nodepool=beijing
kubectl label node <node2> apps.openyurt.io/desired-nodepool=shanghai
kubectl get nodepool

预期输出

NAME       TYPE   READYNODES   NOTREADYNODES   AGE
beijing Edge 1 0 6m2s
shanghai Edge 1 0 6m1s

批量部署边缘应用

在我们深入细节之前,让我们看看 KubeVela 是如何描述部署到边缘的应用的。通过下面这个应用,我们可以将Nginx Ingress Controller部署多份到各自的边缘节点池。使用同一个应用来统一配置 Nginx Ingress 可以消除重复,降低管理负担,也方便后续对集群内的组件统一进行发布运维等常见操作

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: edge-ingress
spec:
components:
- name: ingress-nginx
type: helm
properties:
chart: ingress-nginx
url: https://kubernetes.github.io/ingress-nginx
repoType: helm
version: 4.3.0
values:
controller:
service:
type: NodePort
admissionWebhooks:
enabled: false
traits:
- type: edge-nginx
policies:
- name: replication
type: replication
properties:
selector: [ "ingress-nginx" ]
keys: [ "beijing","shanghai" ]
workflow:
steps:
- name: deploy
type: deploy
properties:
policies: ["replication"]

一个 KubeVela Application 有 3 个部分:

  1. 一个 helm 类型组件。它描述了我们想要安装到集群的 Helm 包版本。此外,我们给这个组件附加了一个运维特征(trait)edge-nginx 。我们稍后将展示这个运维特征的具体情况,现在你可以将其视为一个包含不同节点池的属性的补丁。
  2. 一个 replication (组件分裂)策略。它描述了如何将组件复制到不同的节点池。该 selector 字段用于选择需要复制的组件。它的 keys 字段将把一个组件转换为具有不同 key 的两个组件。(“beijing”和“shanghai”)
  3. deploy 工作流步骤。它描述了如何部署应用程序。它指定 replication 策略执行复制工作的策略。

    注意:

    1. 如果你希望此应用程序正常工作,请先在集群下发在下文介绍的 edge-ingress 特性。
    2. deploy 是一个 KubeVela 内置的工作流程步骤。它还可以在多集群场景中与overridetopology 策略一起使用 。

现在,我们可以将应用下发到集群。

vela up -f app.yaml

检查应用状态和 Kubevela 创建的资源。

vela status edge-ingress --tree --detail

预期输出

CLUSTER       NAMESPACE     RESOURCE                           STATUS    APPLY_TIME          DETAIL
local ─── default─┬─ HelmRelease/ingress-nginx-beijing updated 2022-11-02 12:00:24 Ready: True Status: Release reconciliation succeeded Age: 153m
├─ HelmRelease/ingress-nginx-shanghai updated 2022-11-02 12:00:24 Ready: True Status: Release reconciliation succeeded Age: 153m
└─ HelmRepository/ingress-nginx updated 2022-11-02 12:00:24 URL: https://kubernetes.github.io/ingress-nginx Age: 153m
Ready: True
Status: stored artifact for revision '7bce426c58aee962d479ca84e5c
fc6931c19c8995e31638668cb958d4a3486c2'

Vela CLI 不仅可以站在较高层次统一汇集展示应用健康状态,当需要的时候,Vela CLI 也可以帮助你穿透应用,直达底层工作负载,并提供丰富的观测和 Debug 能力,例如,你可以通过vela logs把打印应用的日志;可以通过vela port-forward把部署应用的端口转发到本地;可以通过 vela exec 命令,深入到边缘的容器中执行 Shell 命令排查问题。 如果你想更直观地了解应用去情况,KubeVela 官方还提供了 Web 控制台插件 VelaUX。启用 VelaUX 插件,你可以查看更详细的资源拓扑。

vela addon enable velaux

image.png

正如你所看到的,Kubevela 创建了两个 HelmRelease 资源,把 Nginx Ingress Controller 的 Helm Chart 交付到两个节点池。HelmRelease资源被上述 FluxCD 插件处理并在集群两个节点池中分别安装了 Nginx Ingress。通过以下命令,检查是否在北京节点池中创建了 Ingress Controller 的 Pod,上海节点池同理。

$ kubectl get node -l  apps.openyurt.io/nodepool=beijing                               
NAME STATUS ROLES AGE VERSION
iz0xi0r2pe51he3z8pz1ekz Ready <none> 23h v1.24.7+k3s1

$ kubectl get pod ingress-nginx-beijing-controller-c4c7cbf64-xthlp -oyaml|grep iz0xi0r2pe51he3z8pz1ekz
nodeName: iz0xi0r2pe51he3z8pz1ekz

差异化部署

KubeVela 应用交付过程中如何实现了同一个组件的差异化部署?让我们继续深入了解支撑应用的 Trait(运维特征)和 Policy(应用策略)。上文提到我们在工作流中使用了 KubeVela 内置的组件分裂(replication) Policy,给 ingress-nginx 组件附加了一个自定义的 edge-nginx Trait。

  • 组件分裂 Policy 将组件拆为两个组件,带有不同的 context.replicaKey
  • edge-nginx Trait 使用不同的 context.replicaKey ,将带有不同配置值的 Helm Chart 交付到集群中。让两个 Nginx Ingress Controller 运行在不同的节点池,监听具有不同 ingressClass 的 Ingress 资源。具体的方式是 Patch 了 Helm Chart 的 Values 配置,修改了和节点选择亲和性以及 ingressClass 相关的字段。
  • 在 Patch 不同字段时,使用了不同的 Patch 策略(PatchStrategy),例如使用retainKeys策略能够覆盖原本的值,使用jsonMergePatch策略则会和原本的值合并。
"edge-nginx": {
type: "trait"
annotations: {}
attributes: {
podDisruptive: true
appliesToWorkloads: ["helm"]
}
}
template: {
patch: {
// +patchStrategy=retainKeys
metadata: {
name: "\(context.name)-\(context.replicaKey)"
}
// +patchStrategy=jsonMergePatch
spec: values: {
ingressClassByName: true
controller: {
ingressClassResource: {
name: "nginx-" + context.replicaKey
controllerValue: "openyurt.io/" + context.replicaKey
}
_selector
}
defaultBackend: {
_selector
}
}
}
_selector: {
tolerations: [
{
key: "apps.openyurt.io/example"
operator: "Equal"
value: context.replicaKey
},
]
nodeSelector: {
"apps.openyurt.io/nodepool": context.replicaKey
}
}
parameter: null
}

当更多应用走向边缘

可以看到,为了将 Nginx Ingress 部署到不同节点池,我们仅自定义了一个四十余行的 Trait 并充分利用了 KubeVela 内置的能力。在云原生生态愈加繁荣和云边协同的趋势下,更多的应用都可能走向边缘部署。当新场景中需要一个新的应用部署在边缘节点池时,也无需烦恼,因为在 KubeVela 的帮助下,仿照该模式扩展出一个新的边缘应用部署 Trait 也很容易,无需编写代码。 例如,我们希望将 K8s 社区近期的演进热点 Gateway API 的实现也部署到边缘,通过 Gateway API 增强边缘节点池暴露服务的表达能力、扩展性,在边缘节点使用基于角色的网络 API 等。对于这种场景,我们也可以基于上述扩展方式轻松完成部署任务,只需要定义如下的一个新 Trait。

"gateway-nginx": {
type: "trait"
annotations: {}
attributes: {
podDisruptive: true
appliesToWorkloads: ["helm"]
}
}

template: {
patch: {
// +patchStrategy=retainKeys
metadata: {
name: "\(context.name)-\(context.replicaKey)"
}
// +patchStrategy=jsonMergePatch
spec: values: {
_selector
fullnameOverride: "nginx-gateway-nginx-" + context.replicaKey
gatewayClass: {
name: "nginx" + context.replicaKey
controllerName: "k8s-gateway-nginx.nginx.org/nginx-gateway-nginx-controller-" + context.replicaKey
}
}
}
_selector: {
tolerations: [
{
key: "apps.openyurt.io/example"
operator: "Equal"
value: context.replicaKey
},
]
nodeSelector: {
"apps.openyurt.io/nodepool": context.replicaKey
}
}
parameter: null
}

这个 Trait 和前文提到的部署 Nginx Ingress 使用的 Trait 非常相似,其中,我们也同样对 Nginx Gateway Chart 的 Values 做了一些相似的 Patch,包括节点选择、亲和性、资源名称。和前文 Trait 的区别是该 Trait 指定了 gatewayClass 而非 IngressClass。该案例的 Trait 和应用文件详见 GitHub 仓库。通过自定义这样一个 Trait,我们就给集群扩展了部署一种新应用到边缘的能力。 如果我们无法预知未来边缘计算的发展带来的更多应用部署需求,至少我们可以通过这种更容易扩展的方式不断适应新的场景。

KubeVela 如何解决了边缘部署难题

回顾 Kubevela 是如何解决文章开始提出的关键问题的。

  1. 统一配置:我们使用一个组件来描述要部署的 ingress-nginx Helm Chart 的通用的属性例如 Helm 仓库、Chart 名称、版本等统一配置。
  2. 属性差异:Kubevela 使用了一个用户自定义的运维特征定义,它描述下发到不同的节点池的 Helm 配置差异。该运维特征可以重复用于部署相同的 Helm Chart。
  3. 可扩展性:Kubevela 可以为常见的工作负载(如 Deployment/StatefulSet)或其他打包方式(如 Helm/Kustomize/...)以可编程的方式进行扩展,只需若干行的代码,即可将一种新应用推向边缘场景。

这些得益于 KubeVela 在应用交付和管理领域提供的强大功能,KubeVela 除了能在单个集群内部解决应用的定义、交付、运维和可观测问题,还原生支持了多集群模式的应用发布和管理。目前适合边缘计算场景的 Kubernetes 部署模式并无定式,无论单集群+边缘节点池的架构,还是多边缘集群架构,KubeVela 都能胜任其中的应用管理任务。 在 OpenYurt 和 KubeVela 的配合下,云边应用以统一方式部署,共享相同的抽象、运维、可观测能力,避免了在不同场景中割裂的体验。并且云端应用和边端应用都得以使用 KubeVela 以插件形式不断集成的云原生生态的优秀实践。未来 KubeVela 社区还将不断丰富开箱即用的系统插件,持续交付更好用、更易用的应用交付和管理能力。 如果想了解更多应用部署、管理的能力,可以阅读 KubeVela 官方文档,想要了解 KubeVela 社区的最新动态,欢迎来到 KubeVela 社区(钉钉群 23310022)参与讨论!若你对 OpenYurt 感兴趣,也欢迎来到 OpenYurt 社区(钉钉群 31993519)参与讨论。

· 阅读需要 1 分钟
Gokhan Karadas

This document aims to explain the integration of Kubevela and ArgoCD. We have two approaches to integrate this flow. This doc is trying to explain the pros and cons of two different approaches. Before diving deep into details, we can describe Kubevela and ArgoCD.

KubeVela is a modern software delivery platform that makes deploying and operating applications across multi environments easier, faster, and more reliable.

KubeVela is infrastructure agnostic and application-centric. It allows you to build robust software and deliver them anywhere! Kubevela provides an Open Application Model (OAM) based abstraction approach to ship applications and any resource across multiple environments.

Open Application Model (OAM) is a set of standard yet higher-level abstractions for modeling cloud-native applications on top of today’s hybrid and multi-cloud environments. You can find more conceptual details here.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: first-vela-app
spec:
components:
- name: express-server
type: webservice
properties:
image: oamdev/hello-world
ports:
- port: 8000
expose: true
traits:
- type: scaler
properties:
replicas: 1

ArgoCD

Argo CD is a declarative continuous delivery tool for Kubernetes applications. It uses the GitOps style to create and manage Kubernetes clusters. When any changes are made to the application configuration in Git, Argo CD will compare it with the configurations of the running application and notify users to bring the desired and live state into sync.

Flux

By default, Kubevela works with Flux addon to complete the Gitops journey for applications. You need to enable the flux addon and perform git operations quickly. https://kubevela.io/docs/end-user/gitops/fluxcd.

vela addon enable fluxcd

Integration with ArgoCD

If you want to use Kubevela with ArgoCD, there are two options to enable it.

  • Kubevela dry-run option
  • Kubevela + ArgoCD Gitops syncer mode

1. Vela dry-run approach

First approach is that without using Kubevela GitOps Controller, we can use ArgoCD as the GitOps Controller, which means ArgoCD applies a raw manifest to the target cluster. Kubevela provides us to export oam application components and traits to native Kubernetes resource therefore we can still use OAM based model and ArgoCD can manage resources with custom argo-repo-server plugin. Behind this approach that KubeVela and OAM can also works as a client side tool with the feature dry-run.

This part describe how ArgoCD apply dry-run mode on Kubevela resources. Firstly, ArgoCD and Kubevela same should be installed.

ArgoCD version v2.5.4

Kubevela version v1.6.4

Prerequisites

Argo CD allows integrating more config management tools using config management plugins.

1. Write the plugin configuration file

Plugins will be configured via a ConfigManagementPlugin manifest located inside the plugin container.

apiVersion: v1
kind: ConfigMap
metadata:
name: vela-cmp
namespace: argocd
data:
plugin.yaml: |
apiVersion: argoproj.io/v1alpha1
kind: ConfigManagementPlugin
metadata:
name: vela
spec:
version: v1.0
init:
command: ["vela", "traits"]
generate:
command: ["sh"]
args:
- -c
- |
vela dry-run -f test-argo-oam.yml
discover:
# fileName: "-oam.yml*"
find:
# This does the same thing as fileName, but it supports double-start (nested directory) glob patterns.
glob: "**/*oam*.yml"
kubectl apply -f vela-cmp.yml

This plugin is responsible for converting Kubevela OAM manifest definition to raw kubernetes object and ArgoCD should be responsible to deploy.

2. Register the plugin sidecar

To install a plugin, patch argocd-repo-server to run the plugin container as a sidecar, with argocd-cmp-server as its entrypoint.

Vela plugin runs the vela command to export manifest when the plugin is discovered on git manifest. Before initializing the plugin, we need to install Kubevela CLI to run the order successfully. The below configuration adds an init container to download the necessary CLI.

initContainers:
- name: kubevela
image: nginx:1.21.6
command:
- bash
- '-c'
- |
#!/usr/bin/env bash
set -eo pipefail
curl -fsSl https://kubevela.io/script/install.sh | bash -s 1.6.4
env:
- name: VELA_INSTALL_DIR
value: /custom-tools
resources:
limits:
cpu: 50m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi
volumeMounts:
- name: custom-tools
mountPath: /custom-tools
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent

after adding init container, we need to add our custom sidecar binary to run plugins properly. To use configmap plugin configuration in our sidecar, we need to mount configmap plugin to our pods

containers:
- name: vela
image: nginx
command:
- /var/run/argocd/argocd-cmp-server
env:
- name: KUBECONFIG
value: /kubeconfig
resources: {}
volumeMounts:
- name: var-files
mountPath: /var/run/argocd
- name: vela-cli-dir
mountPath: /.vela
- name: kubeconfig
mountPath: /kubeconfig
subPath: kubeconfig
- name: plugins
mountPath: /home/argocd/cmp-server/plugins
- name: vela-cmp
mountPath: /home/argocd/cmp-server/config/plugin.yaml
subPath: plugin.yaml
- name: cmp-tmp
mountPath: /tmp
- name: custom-tools
mountPath: /usr/local/bin/vela
subPath: vela

3. Configure Argo CD to watch this repo for Git pushes.

Create git repository and put oam manifest to target directory after you can create a simple ArgoCD application that watch git repository and apply manifests to local kubernetes cluster.

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: test-2
spec:
components:
- name: wf-poc-app-2
type: webservice
properties:
image: oamdev/helloworld-python:v1
env:
- name: "TARGET"
value: "ArgoCD"
port: 8080

4. Final Output

ArgoCD synced applications successfully and rendered kubernetes resources.

img

2. Kubevela Controller + ArgoCD Gitops syncer

Second approach is that we can use Kubevela gitops controller way as the server side and argocd can be our gitops syncer. This approach is flexible to use native kubevela feature set without using a custom plugin or dry run module. We just need to add below annotations to our manifest repository to ignore outofsync.

argocd.argoproj.io/compare-options: IgnoreExtraneous

In this example, ArgoCD tracks native Kubevela application resources and its revision.

img

Conclusion

1. Vela Dry-run approach

  • Kubevela can be installed different cluster from ArgoCD.
  • Limited Kubevela features, we just use component and traits definition.
  • Workflow and policy features don’t supported by this way.
  • Depend on Kubevela dry run feature.
  • No needed rbac, ui or any different experience from ARGO.

2. Kubevela Controller + ArgoCD Gitops syncer

  • Kubevela must be installed to ArgoCD cluster.
  • We can use all the features of Kubevela.
  • No need depends on anything.

Reference: https://gokhan-karadas1992.medium.com/argocd-kubevela-integration-eb88dc0484e0

Big thanks to Erkan Zileli 🎉🎉🎉

· 阅读需要 1 分钟
Da Yin

Since Open Application Model invented in 2020, KubeVela has experienced tens of version changes and evolves advanced features towards modern application delivery. Recently, KubeVela has proposed to become a CNCF incubation project and delivered several public talks in the community. As a memorandum, this article will look back into the starting points and give a comprehensive introduction to the state of KubeVela in 2022.

What is KubeVela?

KubeVela is a modern software platform that makes delivering and operating applications across today's hybrid, multi-cloud environments easier, faster and more reliable. It has three main features:

  • Infrastructure agnotic: KubeVela is able to deploy your cloud-native application into various destinations, such as Kubernetes multi-clusters, cloud provider runtimes (like Alibaba Cloud, AWS or Azure) and edge devices.
  • Programmable: KubeVela has abstraction layers for modeling applications and delivery process. The abstraction layers allow users to use programmable ways to build higher level reusable modules for application delivery and integrate arbitrary third-party projects (like FluxCD, Crossplane, Istio, Prometheus) in the KubeVela system.
  • Application-centric: There are rich tools and eco-systems designed around the KubeVela applications, which add extra capabilities for deliverying and operating the applications, including CLI, UI, GitOps, Observability, etc.

KubeVela cares the whole lifecycle of the applications, including both the Day-1 Delivery and the Day-2 Operating stages. It is able to connect with a wide range of Continuous Integration tools, like Jenkins or GitLab CI, and help users deliver and operate applications across hybrid environments. Slide2.png

Why KubeVela?

Challenges and Difficulties

Nowadays, the fast growing of the cloud native infrastructures has given more and more capabilities for users to deploying applications, such as High Availability and Security, but also exposes an increase number of complexities directly to application developers. For example, the Ingress resource on Kubernetes enables users to expose their applications easily, but developers need to handle the Ingress upgrades when the underlying Kubernetes version shifts, which requires knowledges for the Ingress resource. The hybrid deployment across various cloud provides can make this problem even harder. These difficulties are caused by the lack of operational input in the application definition and developers must face the infrastructure details directly if they want to enjoy the benefits brought by the rich cloud-native community. Slide3.png

Open Application Model

To tackle the above challenges and bridge the gap between the use of applications and the understanding of infrasturcture details, Open Application Model (OAM) is jointly proposed by Alibaba Cloud and Microsoft Azure in 2020. The aim is to define a consistent application model for application delivery, irrelevant with the platforms and implementations. The defined application model describes an interface for developers on what an application consists of and how it should work. The former one is known as Component in OAM, which is usually used to model the workloads of the application. The latter one is defined as Trait in OAM, which attaches extra capabilities to Components. Slide4.png

KubeVela as OAM

KubeVela is one of the implementations for the Open Application Model. In KubeVela, the abstraction layer is powered by CUE, a novel configuration programming language which can describe complex rendering logics and work as a superset of JSON. The abstraction layer simplifies the configuration of resources in Kubernetes, which hides the details of implementations and exposes limited parameters to the front developers. With KubeVela application, it is easy for developers to focus on the centric logic of applications, like what container image should be used and how the service should be made accessible. Slide5.png To achieve that, best practices of using Kubernetes native resources are summarized into KubeVela X-Definitions, which provide rendering templates of resources using CUE. These templates can be accessed from various sources, including official repositories, community addons or even self customized implementations by system operators. The templates are mostly infrastructure implemetation agnostic, in other words, not necessarily bond to specific infrastructures. The developers do not need to be aware of the underlying infra when using these templates.

Components & Traits

The application model divides the abstraction of infra into two different aspects. The Component describes the main workload, which particularly in Kubernetes can be Deployments, StatefulSets, Helm Releases, etc. The Trait on the other hands, describes the added capability for the main workload, such as the scaler trait specifying the number of replicas and the gateway trait aggregates the endpoints for access. The separation of concerns in the design of Component and Trait give high extensibility and reusability to the abstraction. Slide6.png For example, the gateway trait could be backended by different infrastructures like Ingress or HTTPRoute. The application developer who uses the trait only needs to care about the exposed parameters, including the path, port and domain. The trait can be attached to various types of workloads, abstracted by different types of components, such as Deployment, StatefulSet, CloneSet, etc. Slide7.png In the cases where application developers and SRE are in the different teams, KubeVela makes clear division for their responsibilities.

  • The platform team providing infrastructures, are responsible to build up X-Definitions where they enforce best practices and deployment confidence.
  • The end users only need to choose the Component and Trait provided by the platform team and use them to assemble applications. They can simply enjoy PaaS-like experiences instead of directly interacting with the infra behind.

These are made possible thanks to the flexible, extensible and programmable system of KubeVela and can be applied under varying environments. Slide8.png

Unified Delivery

Application delivery could happen everywhere. Therefore, another goal for KubeVela application is to build up unified delivery and provide consistent usage for users under various scenarios.

Hybrid-Cloud & Multi-Cluster

In addition to the abstraction layer, KubeVela also supports hybrid-cloud or multi-cluster architecture natively as modern cloud native applications are not only about containers but involves lots of cloud resources as well. Besides, more and more users and teams start facing the difficulties of deliverying applications to various environments or multi-clusters for different purposes, such as testing or high availability. Slide9.png The KubeVela application allows user to define delivery targets and differentiated configurations through policies. The abstraction helps hide the details of how clusters are registered and connected and provide runtime-agnostic usages to app developers. Slide10.png

Addon Integration

To enrich the delivery capability, users can leverage KubeVela addons to make extensions to their system. The addons are discoverable, reusable and easy-to-install capability bundles. They usually contain capability providers, including a wide range of third-party projects, like FluxCD, ClickHouse, Crossplane, etc. Addons not only install those projects into the system but create corresponding definitions for the integration concurrently, which extends the types of Component and Trait that application developers are able to use. The KubeVela community currently have 50+ addons already. Platform builders could enjoy these out-of-box integrations in systems depending on their customized demands. Slide11.png

With addons enabled in the system, it would be possible for end users to assemble applications in more customized ways, such as deploying cloud resources or using advanced workloads. Slide12.png

KubeVela Workflow

While the Open Application Model defines the composition of an application, in real cases, the delivery process of the compositions could still vary a lot. For example, the different components in one application could have inter dependencies or data passing where delivery steps must be executed in specific order. Furthermore, the delivery process sometimes also involves more actions apart from the delivery of resources, such as rollouts or notifications. An extensible workflow is therefore designed to fulfill the needs of the process customization in KubeVela. Slide13.png Similar to Component and Trait, KubeVela workflow also leverages CUE to define workflow steps, providing flexibility, extensibility and programmability. It can be seen as another form of Infrastructure as Code (IaC). A bunch of build-in workflow steps has already provided rich out-of-box capabilities in KubeVela, such as making multi-cluster deployments and sending notifications through slack or email. The lightweight engine ensures the high performance and safety of step executions, compared to other types of engines involving running extra containers. Slide14.png Differ from the Component and Trait definitions in KubeVela, the WorkflowStep definition does not render templates into resources. Instead, it describes the actions to be executed in the step, which calls underlying atomic functions in various providers. Slide15.png With the use of workflow and addons, users are able to build arbitrary delivery process and make customized integrations. For example, it is possible to let the Continuous Integration tools to trigger the delivery of KubeVela applications and implement the GitOps solutions combining FluxCD and other addons. Slide16.png

Day-2 Management

KubeVela cares more other than Day-1 Delivery. It also provides a unified Day-2 application management capability for all it's extensibility. The day-2 management is necessary for system operators and application developers to make continuous operation for the delivered applications and ensure the applications are always under control.

Resource Management

The basic capabilities for application management are for its resources. KubeVela's core controller continuously watches the difference between the current state and the desired state of delivered resources. It makes sure that the live spec is accord with the declared spec recorded in the delivery process and therefore effectively prevents any configuration drifts. Slide18.png Besides, the automated garbage collection help recycle the resources that are not in-use during upgrades or deletion. There are also times resources need to be shared across multiple applications. These are all made possible in KubeVela application through the use of policies. Slide19.png

Version Control

KubeVela application keeps history records for deliveries. These snapshots are useful when new version publish are out of expectations. The change inspectation could be used to diagnose the possible error changes and the rollback allows fast recovery to the previous successful states. Slide20.png

Observability

KubeVela treats observability as first class citizen. It is the eyes to users for monitoring the state of applications and observing exceptions. There are multiple tools and methods in KubeVela to do the observation job. One of the most straightforward way is to use the CLI tool of KubeVela. The Vela CLI is able to provide in-time status info for the application in fine-grain or aggregated level. Slide21.jpg For users that prefer web interfaces, VelaUX provides an alternative way to view application status. Slide22.png In the cases applications are monitored through third-party projects, such as Grafana, Prometheus or Loki, KubeVela further provides addons for bootstrapping the observability infrastructures and empower users to customize the observing rules as codes in applications, through the abstraction layer. Slide23.png A series of out-of-box metrics and dashboards give users the basic capability of automated system observability. These can be used to diagnose system level exceptions and help improve the overall performance. Slide24.png

Eco-system

In addition to the above mentioned tools, KubeVela also has several other tools in the eco-systems to facilitate application delivery.

  • Vela CLI: KubeVela CLI provides various commands that helps you to operate applications, such as managing definitions, viewing resources, restarting workflow, rolling versions.
  • VelaUX: VelaUX is the Web UI for KubeVela. Besides, it incorporates business logics into fundamental APIs and provides out-of-box user experiences for non-k8s-expert users.
  • Terraform Controller: The terraform controller in KubeVela allows users to use Terraform to manage cloud resources through Kubernetes Custom Resources.
  • Cluster Gateway: The gateway that provides unified multi-cluster access interface. Working as Kubernetes Aggregated API Server, the gateway leverages the native Authentication and Authorization modules and enforces secure and transparent access to managed clusters.
  • VelaD: Building on top of k3s & k3d, VelaD integrates KubeVela with Kubernetes cores, which can be extremely helpful for building dev/test environment.
  • Vela Prism: The extension API server for KubeVela built upon the Kubernetes Aggregated API Server. It projects native APIs like creating dashboards on Grafana into Kubernetes resource APIs, so that users can manage 3rd-party resources as Kubernetes native resources.
  • Vela Workflow: The workflow engine translates CUE-based steps and executes them. It works as a pure delivery tool and can be used aside by the KubeVela application. Compared to Tekton, it mainly organize the process in CUE style, instead of using Pods and Jobs directly.

Slide25.png

Stability

To ensure KubeVela is able to handle certain amount of applications under limited resources, multiple load testings have been conducted under various circumstances. The experiments have demonstrated that the performance of KubeVela system is capable of dealing thousands of applications in an ordinary-sized cluster. The observability infrastructure further exposes the bottleneck of KubeVela and guides system operators to do customized tunning to improve the performance in specific use environments.Slide26.png

In a nutshell

Currently, KubeVela has already been applied in production by a number of adopters from various areas. Some mainly use KubeVela's abstraction capability to simplify the use and deploy of applications. Some build application-centric management system upon KubeVela. Some use the customized workflow to orchestrate the delivery process. It is especially welcomed in high-tech industries and shown to be helpful for delivering and managing enourmous applications.

Slide27.png

The KubeVela community has attracted world-wide contributors and continuously evolves over the past two years. Nowadays, there are over 200 contributors from various contries have participated in the developing of KubeVela. Thousands of issues have been raised and 85% of them are already solved. There are also bi-weekly community meetings held in both English and Chinese community.

Slide28.png

With more and more people coming into the community, KubeVela is consistently upgrading itself to fit into more complex, varying use cases and scenarios.

· 阅读需要 1 分钟
Daniel Higuero

Application Delivery on Kubernetes

The cloud-native landscape is formed by a fast-growing ecosystem of tools with the aim of improving the development of modern applications in a cloud environment. Kubernetes has become the de facto standard to deploy enterprise workloads by improving development speed, and accommodating the needs of a dynamic environment.

Kubernetes offers a comprehensive set of entities that enables any potential application to be deployed into it, independent of its complexity. This however has a significant impact from the point of view of its adoption. Kubernetes is becoming as complex as it is powerful, and that translates into a steep learning curve for newcomers into the ecosystem. Thus, this has generated a new trend focused on providing developers with tools that improve their day-to-day activities without losing the underlying capabilities of the underlying system.

Napptive Application Platform

The NAPPTIVE Playground offers a cloud native application platform focused on providing a simplified method to operate on Kubernetes clusters and deploy complex applications without needing to work with low-level Kubernetes entities. This is especially important to be able to accommodate non-developer users' personas as the computing resources of a company are being consolidated in multi-purpose, multi-tenant clusters. Data scientists, business analysts, modeling experts and many more can benefit from simple solutions that do not require any type of knowledge of cloud computing infrastructures, or orchestration systems such as Kubernetes, which enable them to run their applications with ease in the existing infrastructure.

Any tool that works in this space must start by analyzing the existing abstractions. In particular, at Napptive, we focus on the Applications. This quite understood abstraction is not present in Kubernetes, requiring users to manually tag, identify and reason about the different components involved in an application. The Open Application Model provides an excellent abstraction to represent any type of application independently of the cloud provider, containerization technology, or deployment framework. The model is highly customizable by means of adding Traits, Policies, or new Component Definitions.

KubeVela in Napptive

The Napptive Playground leverages Kubevela as the OAM runtime for Kubernetes deployments. Our Playground provides an environment abstraction with multi-tenant guarantees that is equivalent to partitioning a shared cluster by means of differentiated namespaces. The benefit of our approach is the transparency of its configuration using a higher abstraction level that does not involve any Kubernetes knowledge.

The following diagram describes the overall architecture of Napptive and Kubevela deployed in a Kubernetes cluster.

napptive-arch

The user has the ability to interact with the cluster by using the Napptive user interface (CLI or Web UI), or by means of using the standard Kubernetes API with kubectl. Isolated environments can be easily created to establish the logical separations such as type of environment (e.g., deployment, staging, production), or purpose (e.g., projectA, projectB), or any other approach to differentiate where to deploy an application. Once the OAM application is deployed on Kubernetes, Kubevela is in charge of managing the low-level application lifecycle creating the appropriate Kubernetes entities as a result. One of the main differences with other adopters of Kubevela, is the fact that we use Kubevela in a multi-tenant environment. The following figure shows the specifics of an application deployment in this type of scenario.

napptive-arch

The Napptive Playground is integrated with Kubernetes RBAC and offers a native user management layer that works in both on-premise and cloud deployments. User identity is associated with each environment, and Kubevela is able to take that information to ensure that users can only access their allowed resources. Once a user deploys an OAM application, Kubevela will intercept the call and attach the user identity as an annotation. After that, the rendering process will ensure that the user has access to the entities (e.g., trait, component, policy, workflow step definitions) and that the target namespaces are also accessible by the user. Once the application render is ready, the workflow orchestrator will create the different entities in the Kubernetes namespace.

Napptive in the Community

In terms of our involvement with the OAM/Kubevela community, it has evolved overtime from being passive members of the community simply exploring the possibilities of OAM in Kubernetes, to becoming active contributor members in different areas. In particular, we have closely worked with the core Kubevela development team to test and overcome the different challenges related to using a multi-tenant, RBAC compliance installation of Kubevela. Security in this type of installation is critical and it is one of our main focuses within the Kubevela community. We are committed to continue working with the community not only to ensure that multi-tenancy is maintained throughout the different releases, but also to add our own perspective representing our customer use cases.

The Kubevela community is growing at a fast pace, and we try to contribute our adaptations, features, feedback, and thoughts back to the community. This type of framework is deployed in a multitude of environments. The most common one is probably where one or more clusters are inside a company. In our particular application, we are interested in exploring methods to offer computing capabilities for a variety of user personas in a shared cluster, and we contribute our view and experience in the community.

Future Evolution

In terms of future evolution, we believe the Napptive Playground is a great tool to experience working with the Open Application Model without the need of installing new clusters and frameworks. From the point of view of our contributions in the community, we are internally working in exploring QA mechanisms that ensure that customer applications remain working after upgrades, and identifying potential incompatibilities between releases. Once that work is ready, we plan to contribute our testing environment setup back to the community so that it can be adopted in the main branch. We are also excited with new functionalities that have been recently added to Kubevela such as multi-cluster support, and are exploring methods to adopt it inside the Playground. Moreover, we are actively working on an OAM-compatible application catalog that will simplify the way organizations store and make accessible application definitions so that they can be deployed into a cluster from a central repository. The catalog focuses on the OAM entities and relies on existing container registries for storing the image.

· 阅读需要 1 分钟
孙健波

今天,阿里云云原生应用平台总经理丁宇在云栖大会隆重发布了 KubeVela 的全新升级!本次升级是 KubeVela 从应用交付到应用管理不断量变形成的一次质变,同时也开创了业界基于可扩展模型构建交付和管理一体化应用平台的先河

· 阅读需要 1 分钟
姜洪烨

KubeVela 插件(addon)可以方便地扩展 KubeVela 的能力。正如我们所知,KubeVela 是一个微内核高度可扩展的平台,用户可以通过 模块定义(Definition)扩展 KubeVela 的系统能力,而 KubeVela 插件正是方便将这些自定义扩展及其依赖打包并分发的核心功能。不仅如此,KubeVela 社区的插件中心也在逐渐壮大,如今已经有超过 50 款插件,涵盖可观测性、微服务、FinOps、云资源、安全等大量场景功能。

这篇博客将会全方位介绍 KubeVela 插件的核心机制,教你如何编写一个自定义插件。在最后,我们将展示最终用户使用插件的体验,以及插件将如何融入到 KubeVela 平台,为用户提供一致的体验。

· 阅读需要 1 分钟
曾庆国

KubeVela 1.5 于近日正式发布。在该版本中为社区带来了更多的开箱即用的应用交付能力,包括新增系统可观测;新增Cloud Shell 终端,将 Vela CLI 搬到了浏览器;增强的金丝雀发布;优化多环境应用交付工作流等。进一步提升和打磨了 KubeVela 作为应用交付平台的高扩展性体验。另外,社区也正式开始推动项目提级到 CNCF Incubation 阶段,同时在多次社区会议中听取了多个社区标杆用户的实践分享,这也证明了社区的良性发展。项目的成熟度,采纳度皆取得了阶段性成绩。这非常感谢社区 200 多位开发者的贡献。

· 阅读需要 1 分钟

背景

Helm 是云原生领域被广泛采用的客户端应用打包和部署工具,其简洁的设计和易用的体验得到了用户的认可并形成了自己的生态,到如今已有近万个应用使用 Helm Chart 的方式打包。Helm 的设计理念足够简洁,甚至可以归纳为以下两条:

  1. 对复杂的 Kubernetes API 做打包和模板化,抽象并简化为少量参数。
  2. 给出应用生命周期解决方案:制作、上传(托管)、版本化、分发(发现)、部署。

这两条设计原则保证了 Helm 足够灵活,同时也足够简单,能够涵盖全部的 Kubernetes API,很好的解决了云原生应用一次性交付的场景。然而对于具备一定规模的企业而言,使用 Helm 做软件的持续交付就出现了不小的挑战。

Helm 持续交付的挑战

Helm 设计之初就为了保证其简单易用,放弃了复杂的组件编排。所以在应用部署时,Helm 是一股脑将所有的资源交付到 Kubernetes 集群中,期望通过 Kubernetes 面向终态的自愈能力,自动化的解决应用的依赖和编排问题。这样的设计在首次部署时可能没有问题,然而对于具备一定规模的企业生产环境而言,就显得过于理想化了。

一方面,在应用升级时一股脑将资源全部更新很容易因为部分服务短暂的不可用造成整体的服务中断;另一方面,如果软件存在 BUG,也无法及时回滚,很容易将影响范围扩大,难以控制。在某些更严重的场景下,如存在生产环境部分配置被运维人工修改过,由于 Helm 一次性部署会将原有的修改全部覆盖,而 Helm 之前的版本与生产环境可能并不一致,导致回滚也无法恢复,形成更大面积的故障。

由此可见,当具备一定规模以后,软件在生产环境的灰度和回滚的能力极其重要,而 Helm 自身并不能保证足够的稳定性。

如何针对 Helm 做金丝雀发布?

通常情况下,一个严谨的软件升级过程会遵从类似如下流程:大致分成三个阶段,第一阶段升级少量(如 20% )的实例,并切换少量流量到新版本,完成这个阶段后先暂停升级。经过人工确认之后继续第二个阶段,升级更大比例(如 90% )的实例和流量,再次暂停等待人工确认。最后阶段将全量升级到新版本并验证完毕,从而完成整个发布过程。如果升级期间发现包括业务指标在内的任何异常,例如 CPU或 memory 异常使用率升高或请求 500 日志过多等情况,可以快速回滚。

image

上面就是一个典型的金丝雀发布的场景,那么针对 Helm Chart 应用,我们该如何完成上面这个流程呢?业界的典型做法通常有如下两种:

  1. 修改 Helm Chart,将工作负载变成两份,并分别暴露出不同的 Helm 参数,在发布时不断修改两份工作负载的镜像、实例数和流量比例,从而实现灰度发布。
  2. 修改 Helm Chart,将原先的基础工作负载修改为具备同样功能但是具备灰度发布能力的自定义工作负载,并暴露出 Helm 参数,在发布是操纵这些灰度发布的 CRD。

这两个方案都很复杂,有不小的改造成本,尤其是当你的 Helm Chart 是第三方组件无法修改或自身不具备维护 Helm Chart 能力时,这些方法都是不可行的。即使真的去改造了,相比于原先简单的工作负载模式,也存在不小的稳定性风险。究其原因,还是在于 Helm 本身的定位只是一个包管理工具,设计时并不考虑灰度发布、也不针对工作负载做管理。

事实上,当我们跟社区的大量用户深入交流以后,我们发现大多数用户的应用并不复杂,类别都是诸如 Deployment、StatefulSet 这些经典的类型。所以,我们通过 KubeVela( http://kubevela.net/ ) 强大的插件机制,联合 OpenKruise (https://openkruise.io/)社区做了一款针对这些限定类型的金丝雀发布插件。这款插件帮助你不做任何迁移改造,轻松完成 Helm Chart 的灰度发布。不仅如此,如果你的 Helm Chart 比较复杂,你完全可以针对你的场景定制一个插件,获得同样的体验。

下面我们通过一个实际的例子(以 Deployment工作负载为例),手把手带你感受一下完整的流程。

使用 KubeVela 做金丝雀发布

环境准备

  • 安装 KubeVela
$ curl -fsSl https://static.kubevela.net/script/install-velad.sh | bash
velad install

See this document for more installation details.

  • 启用相关的 addon
$ vela addon enable fluxcd
$ vela addon enable ingress-nginx
$ vela addon enable kruise-rollout
$ vela addon enable velaux

在这一步中,启动了以下几个插件:

  1. fluxcd 插件帮助我们具备 helm 交付的能力;
  2. ingress-nginx 插件用于提供金丝雀发布的流量管理能力;
  3. kruise-rollout 提供金丝雀发布能力;
  4. velaux 插件则提供界面操作和可视化。
  • 将 nginx ingress-controller 的端口映射到本地
$ vela port-forward addon-ingress-nginx -n vela-system

首次部署

通过执行下面的命令,第一次发布 helm 应用。在这一步中,我们通过 vela 的 CLI 工具部署,如果你熟悉 Kubernetes,也可以通过 kubectl apply 部署,效果完全相同。

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v1
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
version: "1.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

在上面的例子中,我们声明了一个名为 canary-demo 的应用,其中包含一个 helm 类型的组件(KubeVela 也支持其他类型的组件部署),在组件的参数中包含 chart 的地址以及版本等信息。

另外,我们还为这个组件声明了 kruise-rollout 的运维特征,这个就是 kruise-rollout 这个插件安装后具备的能力。其中可以指定 helm 的升级策略,第一个阶段先升级 20% 的实例和流量,经过人工确认之后再升级90%,最后全量升到最新的版本。

需要注意的是,为了演示效果直观(体现版本变化),我们专门准备了一个 chart 。该 helm chart 的主体包含一个 Deployment 和 Ingress 对象,这是 helm chart 制作时最通用的场景。如果你的 helm chart 同样具备上述的资源,也一样可以通过这个例子进行金丝雀的发布。

部署成功之后,我们通过下面的命令访问你集群内的网关地址,将会看到下面的效果:

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

另外,通过 VelaUX 的资源拓扑页面,我们可以看到五个 V1 版本的实例已经全部就绪。

image

升级应用

应用下面的这个 yaml ,来升级你的应用。

cat <<EOF | vela up -f -
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: canary-demo
annotations:
app.oam.dev/publishVersion: v2
spec:
components:
- name: canary-demo
type: helm
properties:
repoType: "helm"
url: "https://wangyikewxgm.github.io/my-charts/"
chart: "canary-demo"
# Upgade to version 2.0.0
version: "2.0.0"
traits:
- type: kruise-rollout
properties:
canary:
# The first batch of Canary releases 20% Pods, and 20% traffic imported to the new version, require manual confirmation before subsequent releases are completed
steps:
- weight: 20
# The second batch of Canary releases 90% Pods, and 90% traffic imported to the new version.
- weight: 90
trafficRoutings:
- type: nginx
EOF

我们注意到新的 application 和首次部署的相比仅有两处改动:

  1. 把 app.oam.dev/publishVersion 的 annotation 从 v1 升级到了 v2。这代表这次修改是一个新的版本。
  2. 把 helm chart 的版本升级到了 2.0.0 ,该版本的 chart 中的 deployment 镜像的 tag 升级到了 V2。

一段时间之后,我们会发现升级过程停在了我们上面定义的第一个批次,也就是只升级 20% 的实例和流量。这个时候多次执行上面访问网关的命令,你会发现 Demo: v1和 Demo: v2交替出现,并且有差不多 20% 的概率得到 Demo: v2的结果。

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V2

再次查看应用的资源的拓扑状态,会看到由 kruise-rollout trait 创建出来的 rolloutCR 为我们创建了一个新版本的实例,而之前工作负载创建出来的5个旧版本的实例并没有发生变化。

image

接下来,我们通过 vela 的 CLI 执行下面的命令,通过人工审核恢复继续升级:

$ vela workflow resume canary-demo

一段时间之后,通过资源拓扑图,我们看到五个新版本的实例被创建出来了。这个时候我们再次访问网关,会发现出现 Demo:v2 的概率大幅增加,接近于90%。

快速回滚

通常在一个真实场景中的发布中,经常会有经过人工审核之后,发现新版本应用的状态异常,需要终止当前的升级,快速将应用回退到升级开始前的版本。

我们就可以执行下面的命令,先暂停当前的发布工作流:

$ vela workflow suspend canary-demo
Rollout default/canary-demo in cluster suspended.
Successfully suspend workflow: canary-demo

紧接着回滚到发布之前的版本,也就是 V1 :

$ vela workflow rollback canary-demo
Application spec rollback successfully.
Application status rollback successfully.
Rollout default/canary-demo in cluster rollback.
Successfully rollback rolloutApplication outdated revision cleaned up.

这个时候,我们再次访问网关,会发现所有的请求结果又回到了 V1 的状态。

$ curl -H "Host: canary-demo.com" http://localhost:8080/version
Demo: V1

这时候,通过资源拓扑图,我们可以看到,金丝雀版本的实例也全部被删除了,并且从始至终,v1 的五个实例,作为稳定版本的实例,一直没有发生任何变化。

image

如果你将上面的回滚操作改为恢复继续升级,将会继续执行后续的升级过程,完成全量发布。

上述 demo 的完整操作过程请参考 文档

如果你希望直接使用原生的 k8s 资源实现上面过程可以参考 文档 。另外,除了 Deployment ,kruise-rollout 插件还支持了 StatefulSet 和 OpenKruise 的 CloneSet ,如果你的 chart 中的工作负载类型是以上三种都可以通过上面的例子实现金丝雀发布。

相信你也看注意到,上面的例子我们给出的是基于 nginx-Ingress-controller 的七层流量切分方案,另外我们也支持了 Kubernetes Gateway 的 API 从而能够支持更多的网关类型和四层的流量切分方案。

发布过程的稳定性是如何保证的?

首次部署完成后,kruise rollout 插件(以下简称 rollout)会监听 Helm Chart部署的资源,在我们的例子中就是 deployment, service 和 ingress ,还支持 StatefulSet 以及 OpenKruise Cloneset。rollout 会接管这个 deployment 后续的升级动作。

在进行升级时,新版本的 Helm 部署首先生效,会将 deployment 的镜像更新为 v2,然而这个时候 deployment 的升级过程会被 rollout 从 controller-manager 手中接管,使得 deployment 下面的 Pod 不会被升级。于此同时,rollout 会复制一个金丝雀版本的 deployment,镜像的 tag 为 v2,并创建一个 service 筛选到它下面的实例,和一个指向这个 service 的 ingress ,最后通过设置 ingress 相对应的 annotation,让这个 ingress 承接金丝雀版本的流量,具体可以参考 文档 ,从而实现流量切分。

在通过所有的人工确认步骤之后,完成全量发布时,rollout 会把稳定版本的 deployment 升级控制权交还给 controller-manager ,届时稳定版本的实例会陆续升级到新版本,当稳定版本的实例全部就绪之后,才会陆续销毁金丝雀版本的 deployment,service 和 ingress,从而保证了整个过程中请求流量不会因为打到没有就绪的实例上,导致请求异常,做到无损的金丝雀发布。

之后我们还将在以下方面持续迭代,支持更多的场景并带来更加稳定可靠的升级体验:

  1. 升级过程对接 KubeVela 的 workflow 体系,从而引入更加丰富的中间步骤扩展体系,支持升级过程中通过 workflow 执行通知发送等功能。甚至在各个步骤的暂停阶段,对接外部的可观测性体系,通过检查日志或者监控等指标,自动决策是否继续发布或回滚,从而实现无人值守的发布策略。
  2. 集成 istio 等 更多的 addon,支持 serviceMesh 的流量切分方案。
  3. 除了支持基于百分比流量切分方式,支持基于 header 或 cookie 的流量切分规则,以及支持诸如蓝绿发布等特性。

总结

前文已经提到,KubeVela 支持 Helm 做金丝雀发布的流程完全是通过 插件(Addon)体系实现的,fluxcd addon 助我们通过部署和管理 helm chart 的生命周期。kruise-rollout addon 帮助我们实现 workload 的实例升级以及在升级过程中流量的切换。通过组合两个 addon 的方式,实现了对于 helm 应用的全生命周期的管理和金丝雀升级,不需要对你的 Helm Chart 做任何改动。你也可以针对你的场景 编写插件 ,完成更特殊的场景或流程。

基于 KubeVela 强大的可扩展能力,你不仅可以灵活地组合这些 addon,你还可以保持上层应用不做任何变动的情况下,根据不同的平台或环境动态替换底层的能力实现。例如,如果你更希望采用 argocd 不是 fluxcd 实现对于 helm 应用的部署,你就可以通过启用 argocd 的 addon 实现相同的功能,上层的 helm 应用不需要做任何改变或迁移。

现在 KubeVela 社区已经提供了数十个 addon ,可以能够帮助平台扩展 可观测性,gitOps,finOps ,rollout 等各方面的能力。

image

Addon 的仓库地址是:https://github.com/kubevela/catalog ,如果你对 addon 感兴趣的话,也非常欢迎为这个仓库提交你的自定义插件,为社区贡献新的生态能力!

· 阅读需要 1 分钟

你可能已经从这篇博客 中了解到,我们可以通过 terraform 插件使用 vela 来管理云资源(如 s3 bucket、AWS EIP 等)。 我们可以创建一个包含一些云资源组件的应用,这个应用会生成这些云资源,然后我们可以使用 vela 来管理它们。

有时我们已经有一些 Terraform 云资源,这些资源可能由 Terraform 二进制程序或其他程序创建和管理。 为了获得 使用 KubeVela 管理云资源的好处 或者只是在管理云资源的方式上保持一致性,我们可能希望将这些现有的 Terraform 云资源导入 KubeVela 并使用 vela 进行管理。如果我们只是创建一个描述这些云资源的应用,这些云资源将被重新创建并可能导致错误。 为了解决这个问题,我们制作了 一个简单的 backup_restore 工具。 本博客将向你展示如何使用 backup_restore 工具将现有的 Terraform 云资源导入 KubeVela。