Instructor Guide to Teaching on Chameleon Cloud

This page includes some notes for instructors who want to use these materials in their own courses.

Overview of available lab materials

Lab Instructions	Trovi Artifact	GitHub Repo
Hello, Chameleon (Intro)
Hello, Linux (Intro)	(Use "Hello, Chameleon" artifact)
Cloud computing on Chameleon (Cloud computing)
Build an MLOps pipeline on Chameleon (DevOps and continuous X for ML systems)
Persistent storage on Chameleon (Large scale data systems)
Large-scale model training on Chameleon (Model training at scale) ⛔ Deprecated in favor of new vision-language model training lab
Training a large vision-language model (Model training at scale) 🔧 In development
Train ML models with MLFlow and Ray (Model training infrastructure and platforms) ⛔ Deprecated in favor of separate MLFlow + Ray labs
ML experiment tracking with MLFlow (Model training infrastructure and platforms)
Building a model training cluster with Ray (Model training infrastructure and platforms) 🔧 In development
Model optimizations for serving (Model serving)
Serving on edge devices (Model serving)
System optimizations for model serving (Model serving)
Offline evaluation of ML systems (Monitoring and evaluating ML systems)
Online evaluation of ML systems (Monitoring and evaluating ML systems)
Closing the feedback loop (Monitoring and evaluating ML systems)

Things that are known to be broken

Over the course of Fall 2025. I will be changing/updating this material in preparation for Spring 2026. I will almost certainly break some things along the way.

Before the class begins

Before the class begins an instructor should:

Set up a project

Create an account on Chameleon Cloud, and create a project for the course.

From the project page, click “Add multiple users” and then copy the “request to join” link which you can distribute to your students.

Increase project quota

Use the “Help Desk” feature on Chameleon to request a quota increase for KVM@TACC for your project.

For network access:

request unlimited private networks and subnets,
“floating IPs” increased to 1.5x the expected enrollment,
and 50 security groups.

If you are using “Cloud computing” and/or “MLOps Pipeline” labs, you need the quota to permit up to 3 m1.medium instances per student at a time:

The “number of instances” quota should be increased to 3x the expected enrollment
“number of cores” should be increased to 6x the expected enrollment
“RAM” should be increased to 4GB x 3 x expected enrollment
“number of routers” increased to 1x the expected enrollment.

If you are not using “Cloud computing” or “MLOps Pipeline” labs, but you are using “Persistent data”, you need the quota to permit 1 m1.large instances per student at a time:

The “number of instances” quota should be increased to 1x the expected enrollment
“number of cores” should be increased to 4x the expected enrollment
“RAM” should be increased to 8GB x expected enrollment

If you are using “Persistent data”, you also need one 2GB block storage volume per student at a time:

“number of block storage volumes” should be increased to the expected enrollment, and total block storage should be increased to 2GB x expected enrollment.

If you are not using “Cloud computing”, “MLOps Pipeline,” or “Persistent data”, labs, but you are using “Evaluation and Monitoring,” you need the quota to permit 1 m1.medium instances per student at a time:

The “number of instances” quota should be increased to 1x the expected enrollment
“number of cores” should be increased to 2x the expected enrollment
“RAM” should be increased to 4GB x expected enrollment

If students will develop open-ended projects, you may need to request additional quota increases depending on their needs. However, you can do this later on an as-needed basis, if you keep an eye on usage.

Reserve GPU nodes

To ensure that your students will be able to access GPU resources as needed, you will pre-reserve the bare metal hosts you need leading up to the relevant due dates for labs. The following table shows the GPU types and expected number of hours per student for each lab:

Assignment	Instance Type(s)	Number of Hours per Student
Train at Scale (Multi GPU)	`gpu_a100_pcie`, `gpu_v100`	2
Train at Scale (One GPU)	`compute_gigaio` at CHI@UC only (needs A100 80GB)	2
Training in a Cluster (Multi GPU)	`gpu_mi100`	3
Experiment Tracking (One GPU)	`compute_liqid` at CHI@TACC or `compute_gigaio` at CHI@UC	3
Model Serving Optimizations	`compute_liqid` at CHI@TACC or `compute_gigaio` at CHI@UC	3
Serving from the Edge	`rpi5` on CHI@Edge (you may need to BYOD)	2
System Serving Optimizations	`gpu_p100`	3

Then, use the “Help Desk” feature on Chameleon. Give the list of reservations you have made, and ask for these resources to be allocated for exclusive use for your course during the times you have reserved.

If students will do open-ended projects that require GPU, you may want to make additional advance reservations to support this.

During the course

Communication to students

Give students explicit instructions about expected resource usage, and what they can expect to happen if they ignore these instructions (e.g “if you make a reservation that is longer than 4 hours for X resource, course staff will delete it”). Also remind students that the infrastructure cannot support all of them doing the assignment at the same time in the last few hours before the deadline.
If you have a large class, you may want to assign days to smooth peak usage for lab assignments, e.g. “if your student ID ends in an even number you can use the infrastructure on Monday, Wednesday, Friday, or Saturday; if your student ID ends in an odd number you can use the infrastructure on Tuesday, Thursday, Friday, or Sunday”.

Managing resources

Keep an eye on resource usage, to make sure nobody has excessive use and to make sure resources are available to students who need them.
At the beginning of your advance reservations, after Chameleon staff have re-configured the resource to be exclusively available to your project, you will delete your “placeholder” reservation so that students can then make their own reservations.