Open Source Federated Clouds on the Horizon
Cloud computing has emerged as a model for providing access to large amounts of data and compute resources by using seamless interfaces independent of where and how the services are hosted. Ease of management, resource configuration, and low-cost maintenance have helped the widespread deployment of cloud architectures worldwide.
Due to the huge uptick in new cloud services being offered by cloud service providers, it is becoming very hard to find a single provider to offer all services needed by end users in one place. We are witnessing the emergence of federations of clouds designed to satisfy complex user needs across multiple cloud environments.
Cloud Computing Today
Private clouds for enterprises have grown strongly in the last few years, and in particular, the telecommunications industry is evaluating cloud computing virtualization models as part of the migration towards Software-Defined Networking (SDN) and Network Function Virtualization (NFV) initiatives.
Contemporary cloud computing platforms remain predominantly locked in application and data silos, where interoperability and portability simply do not exist. This lack of extensibility across different cloud environments creates roadblocks to the extraction of business value from previously untapped sources of revenue.
Our understanding of the challenges is not new. In 2009, Vinton Cerf, Google Vice President and Chief Internet Evangelist, was quoted by ReadWrite.com as saying, “I am seeing a possibility of inter-cloud problems mirroring the Internet problems we had thirty or forty years ago.” While companies are competing to make a bigger, better, larger cloud service, fewer people are concerned with the basic mechanics of the inter-cloud and how we can ensure that it all works efficiently, reliably, and securely. Said Cerf, “You build these clouds and they know about themselves and they know about their own resources, but they don’t know about any other cloud. So the question is: How do you say ‘send this information to this cloud over here’ if there isn’t any way to call it?”
Isolated clouds affect providers and customers alike:
- Provider lock-in requirements enforce cloud boundaries.
- Multiple geographic locations: 1) No one provider is able to establish data centers in all possible locations; 2) customers cannot determine in advance the best location for hosting their services because they may not know the origin of all end users; 3) unable to meet Quality-of-Service (QoS) expectations between providers; and 4) regulatory details are expected to differ by country and region.
- Inflexible resource utilization: No seamless mechanisms exist for scaling hosted services across multiple, geographically distributed data centers.
Much like the challenges faced by a growing Internet, the evolution of cloud technologies has created the need to federate protocols. The benefits of cloud federation include:
- Expanding Geographical Footprint: Leading cloud service providers are establishing data centers worldwide, but they are not likely to do so in order to meet each countries regulatory requirement for local storage. Federated-application developers will create the tools needed to manage the fine-grained control of resource allocation and policy detail. Only by utilizing multiple clouds will customers gain access to such high performing, widely distributed, and legally compliant services to clients.
- Better Application Resilience: Several cases of cloud service outages during the past several years, including those of major vendors, have disrupted business. Among the post-mortem recommendations is the advice that customers configure their applications to use multiple data centers for fault tolerance. Customers now want to be independent of any single data center and immune to cloud availability-zone outages by spreading services across multiple cloud providers. Experts point to the unavailability of service as the number one inhibitor to the adoption of cloud computing. Besides fault tolerance, using resources from different providers acts as an insurance policy against a cloud provider becoming hamstrung for regulatory or legal reasons.
- Avoiding Vendor Lock-in: Customers who are able to freely transit running workloads across multiple clouds have the advantage over providers who have no incentive to limit policy or pricing decisions that negatively affect their clients.
- Greater Flexibility: Many customers will run workloads in on-premise clusters with automatic overflow capacity assigned to a cloud-hosted cluster. Other customers may prefer to default to the cloud but divert privacy-sensitive workloads to run locally.
Cloud federation environments enable cloud customers to diversify their infrastructure portfolio in terms of both vendor and location. Customers can alter or expand their business practices based on location-specific vendor policies or regulatory regimes.
The big idea for cloud computing is that a cloud service deliver constant availability, elasticity, and scalability to meet contracted customer requirements. A cloud provider should ensure adequate resources, but how much is enough in a market where over provisioning resources to meet spikes in customer workload requirements is common? To mitigate these issues, federated clouds offer the following benefits to the cloud service providers:
- Expand on Demand: By offloading to other clouds, providers can scale resources much like cloud-hosted applications do within a cloud. A cloud can maintain enough resources in a ready-to-use state to meet expected loads and a buffer for typical load deviations. When workloads increase beyond these limits, resources from other clouds can be invoked automatically by prior arrangement.
- Better SLAs to Customers: In worst-case scenarios like resource shortages or data center outages, incoming workloads can be moved to other clouds. This means that cloud providers can offer better Service Level Agreements (SLAs) to their customers.
The telecom industry is reaching a saturation point. At the same time that capital investment is growing due to increasing data use and the cost of 4G/LTE deployment, Average Revenue Per Unit (ARPU) is stagnant or declining because of increased competition from Over-The-Top (OTT) players such as Skype, Facebook, Google, and Netflix. The OTT industry is driving data revenues to unprecedented levels at the expense of traditional core services like voice and Short Message Service (SMS).
Eroding margins as a result of shrinking voice service revenues coupled with competition from pure-play cloud computing providers like Amazon Web Services (AWS) bring the telecom carriers’ precarious strategic position into very sharp focus.
With a long and successful history of embracing new technologies such as IP wireless, SMS, and MMS by collaborating globally but competing locally, carriers are uniquely prepared to embrace the interoperability standards necessary for a global, federated cloud solution.
Decades of cooperation within the telecom industry is a sharp contrast to today’s cloud service providers who routinely create proprietary ‘walled gardens.’ The absence of a federated cloud mechanism keeps customers preoccupied with locked-in price pressures to maintain expected SLAs. New customers are understandably unwilling to move mission-critical applications to the cloud. Herein lies an opportunity for telecom carriers to create a global, federated cloud service to rival entrenched incumbents.
In the grand scheme of big technology shifts, cloud computing is still new. Carriers have time to get their ecosystems in order and collaborate to build a competitive service. On the vendor side of the equation, Cisco has recently announced that the ‘Intercloud Fabric’ is the centerpiece of the next generation of cloud computing and is aligned with their position on the ‘Internet of Everything (IoE).’
A federated cloud-computing environment creates large resource efficiencies that open opportunities for next-generation telecom revenue growth, including:
- Efficiency of NFV-related intra-domain resources within Engineering, Procurement, and Construction (EPC) environments toward future cloud federations, which will also apply to optimizing base-station computing
- Allowing virtual storage and compute frameworks to be integrated with MPLS/VPN service offerings; remote resources made available in the requester’s address space through Virtual Private Cloud (VPC) mechanisms will leverage MPLS-based SDN protocols
Cloud Federation Projects
Huawei is leading the design and development of key Mesos and Ubernetes Federation projects including close collaborations with the Mesosphere Data Center Operating System (DCOS) and Google Kubernetes teams.
The distinction between the two initiatives is that Mesos federations allow for heterogeneous clusters across the entire datacenter applications landscape — including combinations of Cloud Foundry, Hadoop, Spark, Kubernetes, and others — whereas, Ubernetes environments are limited to homogeneous federations of independent Kubernetes clusters.
It is important to highlight that the Mesos Federation complements the Kubernetes Federation in cases where the underlying, independent Kubernetes environment is already federated via Ubernetes.
Apache Mesos Federation
Apache Mesos is an open-source cluster manager developed at the University of California, Berkeley, that abstracts CPU, memory, storage, and other compute resources from physical or virtual machines to enable fault-tolerant and elastic distributed systems to be easily built and run effectively.
A common Mesos resource management layer is the most indispensable component of the cluster-level infrastructure layer. Much like an operating system layer is needed to manage resources and provide basic services in a single computer, a system composed of thousands of computers, networking, and storage requires a layer of software that provides analogous functionality — but at a much larger scale. This layer is typically referred to as the cluster-level infrastructure. Mesos essentially controls the mapping of user tasks to hardware resources, enforces priorities and quotas, and provides basic task management services.
Today’s Mesos cluster environment is more of a monolithic architecture in which a single instance of the Mesos control plane manages a single logical cluster composed of nodes in multiple availability zones and cloud providers. In a large Mesos installation, the operator might want to ensure that even if the Mesos masters are inaccessible or failed, new tasks can still be scheduled across multiple different frameworks. The current Mesos High-Availability (HA) multi-masters approach provides only a partial active-passive solution.
The goal for the Mesos federation project is to enable fine-grained elastic resource allocation across multiple cloud environments using federated Mesos as the underlying resource management layer. Our approach extends the current Mesos environment with multiple Mesos masters, each controlling and accounting for the resources of one cluster in the cloud while independently holding snapshots of each other’s state. Each Mesos master in the federation cooperates, and each coordinates the work and allows compute frameworks to schedule jobs according to their preferences for different cloud environments and the geographical locations of the clouds.
Key benefits of enabling the federated Mesos environment include:
- Scalability: Cloud-bursting to accommodate peak demand
- Collaboration: Sharing of infrastructure between partner data centers
- Multi-Site Deployments: Infrastructure aggregation across distributed data centers
- Reliability: Fault tolerance architectures across sites
- Performance: Service deployment closer to end users
- Cost: Dynamic placement to reduce overall infrastructure cost
The design includes:
- Master communication using a distributed systems protocol (i.e., gossip protocol)
- Changes to the Mesos language bindings so they are capable of connecting and talking to multiple masters; for example, changing ‘mesos-go’ so changes to the framework developed using ‘mesos-go’ need not change
- Enabling masters to understand centralized distributed policy stores like HashiCorp’s Consul to agree on who should send the offers and to which framework
- Prevention of single points of failure in the data center
Ubernetes Project for Kubernetes Clusters Federation
GitHub informally defines Ubernetes as a federation of Kubernetes clusters. In turn, Wikipedia describes Kubernetes as “an open source container cluster manager originally designed by Google and donated to the Cloud Native Computing Foundation that aims to provide a ‘platform for automating deployment, scaling, and operations of application containers across clusters of hosts.’”
Kubernetes operates on the same principles that allow Google to run billions of containers a week, which means that Kubernetes can scale without having to increase the size of the operations team.
Ubernetes is a nascent technology that connects multiple self-contained Kubernetes clusters for sharing and managing jobs across different environments, even across clouds.
Key reasons for federating a Kubernetes environment include:
- High Availability: Customers want to be immune to outages originating from a single availability zone, region, or cloud provider.
- Sensitive Workloads: Certain workloads are assigned to run on a particular cluster and cannot be scheduled or migrated to other clusters.
- Capacity Overflow: Customers typically prefer to run workloads on a primary cluster, with overflows distributed to other clusters automatically.
- Avoiding Vendor Lock-in: Customers want to distribute workloads proportionally across cloud providers.
- Cluster Size Enhancement: Currently, Kubernetes cluster size is limited. The community is actively working to improve these constraints because experts predict that a small cluster size will be problematic if Kubernetes (also called K8S for K-eight characters-S) is used for large workloads or public PaaS infrastructures. The goal is to separate different tenants to different clusters (presently), plus add a unified view (in development).
The functionality requirements derived from the use cases in this article include:
- Clients able to register and de-register clusters.
- Workloads that spread to different clusters according to workload distribution policies.
- Pods running on different clusters that are able to discover and communicate with each other.
- Traffic to pods that is load balanced among clusters.
- A control plane that tracks cluster status and migrates the workload accordingly.
- Clients that have a unified view and central control point for all these activities.
Global Telecom Federation
Cloud-computing federations have great potential to usher in an environment of large resource efficiencies. Although much work remains to move this technology forward, telecommunications companies can seize the opportunities that federated and hybrid clouds present for generating new business and higher profits.