Cloud computing
In this week’s lecture, we are primarily concerned with the underlying infrastructure that our machine learning system will run on.
We talked about how to build a cloud - we identified key ingredients including:
- hardware and connectivity
- compute (bare metal, VM, container), network, and storage (block, file, object) services
- shared services (authentication and authorization, bootable images)
- user interfaces (GUI, CLI, Python SDK)
and as an example, we named the OpenStack components that handle each of these. This semester we are going to spend a lot of time on Chameleon, which is an OpenStack cloud.
We said that besides for providing access to compute, storage, and network resources, cloud infrastructure providers can also offer managed services in which the infrastructure provider assumes responsibility for parts of the stack that were previously the user’s responsibility. We named some cloud service models, and described what the infrastructure provider is responsible for providing and maintaining, in each case.
We also talked about how the virtualization of compute, storage, and networking resources in the cloud provides opportunities to manage our services like “cattle” instead of like “pets” - with systems in place to handle their lifecycles at scale. We introduced containers (e.g. Docker) and container orchestration frameworks (e.g. Kubernetes) as tools for managing services at scale.