Project

For your course project, you will design and implement an end-to-end ML system. In your project, you must use the techniques discussed in our lectures to address some of the challenges also discussed in those lectures.

  1. Project context requirement
  2. Group work expectations
  3. Project deliverables and deadlines
    1. Project proposal (Due Mar 2)
  4. Policy on AI use

Project context requirement

For your project, you will integrate one or more complementary ML features into an existing open-source, self-hosted software system that you will run on Chameleon.

Why? In practice, ML models most often operate as components within larger systems that impose constraints around data availability, latency, reliability, deployment, and operational ownership. If you design a new service “around the model” you get to ignore these constraints and do whatever is convenient, which bypasses the core challenges the course is intended to teach. So instead, we are asking you to design and implement a complementary feature in the context of an existing system and its constraints.

For example, you may design a feature that complements:

If the project you plan to complement has fewer than 2.5k stars on Github, you should get advance approval from course staff before preparing your proposal.

You don’t have to use the project exactly as intended - for example, if you want to implement an ML feature for team chat that is specifically designed for students working on a group project together, you can do it in Zulip even if that’s a general purpose chat service. Or if you want to implement an ML feature for a news website, you can do it with Ghost, and so on.

Also, integration with the core open source project is exempt from the “you must understand everything about your code and what it does” policy on AI use. You are welcome to vibe code the part that integrates with the core open source service. Right around the time when you’ll be integrating your system (in April), we’ll give some additional guidance and resources on using an AI coding agent to help with this part.

Additional requirements:

  • Your ML feature must be designed so that when deployed in “production”, you get new data and feedback from “users”, and can use this for retraining.
  • You can use an LLM out-of-the-box (without retraining) for part of your project, but if you do, you must also include another model that you train/retrain.
  • You must use at least one high-quality non-synthetic external dataset with known lineage (who created it, how, etc.)

Group work expectations

You will complete these projects in groups of 3 or 4, where certain elements of the project are going to be “owned” by all group members, and other parts are going to be “owned” by individual group members.

Role Responsibilities
All group members (joint) Project idea and value proposition; high-level approach; overall system integration
(3-person team): Platform / DevOps responsibilities are shared; each member owns automation related to their primary role (Unit 3)
Training Model training and retraining pipelines (Units 5–6)
Offline evaluation (part of Unit 8)
Safeguarding elements related to role (Unit 10)
Serving Model serving (Unit 7)
Online evaluation and monitoring (part of Unit 8)
Safeguarding elements related to role (Unit 10)
Data Data pipeline (Unit 4)
Closing the feedback loop (getting outcomes/labels in production) (part of Unit 8)
Emulated operational data
Safeguarding elements related to role (Unit 10)
DevOps / Platform
(4-person team)
Infrastructure as code, CI/CD/CT pipelines, automation (Unit 3)
Infrastructure monitoring and observability
Safeguarding elements related to role (Unit 10)

Part of your project grade will be common to the entire group, based on the “jointly owned” elements and shared responsibilities; part of your project grade will be individual, based on the work you have produced in your personal role.

Can I work by myself and take on all of these roles? No, not in this course. An explicit learning objectives of this course is to practice building, operating, and integrating ML systems as a team activity. In real ML systems, components such as data pipelines, training workflows, serving infrastructure, and automation are developed independently and must interoperate through well-defined contracts. In a solo project, you can change interfaces arbitrarily to simplify implementation, which bypasses the core challenge of designing components to work as part of a whole. Therefore, a group project is required.

Project deliverables and deadlines

Milestone Due Date Points Scope
Project proposal Mar 2, 2026 5 / 40 Problem statement, data sources, modeling approach, alignment with business requirements
Initial implementation Apr 6, 2026 10 / 40 Data, model training, model serving, monitoring and evaluation implemented individually (not necessarily integrated); overall pipeline with dummy steps also implemented for 4-person groups
System implementation Apr 20, 2026 15 / 40 All components tightly integrated into a single end-to-end ML system, including safeguarding
Ongoing operation May 4, 2026 10 / 40 Operation with emulated “live” data; operational behavior, stability, and evaluation over time

More specific information will be shared ahead of each deadline.

Project proposal (Due Mar 2)

Focus: intent, feasibility, business alignment.

Format: You will submit a document (max 2 pages) and slides for a presentation (10 minutes for a 3-person team, 12 minutes for a 4-person team) covering the items listed below. You will also sign up for a presentation slot during the week of March 2, in which your group will present the proposal to a pair of course assistants and answer questions about it.

Rubric: The proposal will be graded according to the following rubric:

Requirements checklist (all must be satisfied, otherwise the team cannot proceed with the proposed project):

  • Team defines a hypothetical service into which the ML feature will be integrated
  • The service will be realized using an existing open source project (at least 2.5k stars on GitHub)
  • The proposed ML feature(s) will be a complementary feature
  • The service will be fully hosted on Chameleon
  • The proposed design involves at least one model that is trained/retrained
  • Training will involve at least one high-quality non-synthetic external dataset with known lineage
  • When deployed in “production”, the system will get new data and feedback from “users”, and can use this for retraining

Joint responsiblities (3/5 points, all team members will have the same score for this part):

  • (0.5 points) Describe the public-facing service that you will realize with the selected open source project (not the ML feature - the service that the ML feature will be complementary to). Discuss the audience (including anticipated number you are designing for), what their context is, etc.
  • (2 points) Describe the design of the complementary ML feature, following the process from 1.5.5 Specifying the design, and answer questions posed by the course assistants. Make sure to discuss feedback, and how it will be used for re-training, since this is a strict requirement.
  • (0.5 points) Describe external dataset(s) you will use, including a discussion of alignment with the proposed public-facing service. Show a few examples of real data points, and explain the lineage of the data (who collected it, how, why). (Refer to 4.3 Acquiring training data.)

Training team member (2/5 points):

  • (1 points) Specify the type of model(s) that will be used to realize the ML feature(s), and how they will be trained/re-trained.
  • (1 points) Specify input features and output.

Serving team member (2/5 points):

  • (1 points) Estimate operational requirements for serving your ML feature, with suggested numbers (requests/second, latency/request, etc.) and justification.
  • (1 points) Describe how the model output(s) will translate to an outcome in the real system.

Data team member (2/5 points):

  • (1 points) Describe the data flow - what data arrives at the system, how it is processed in real time for inference, how it is processed for training. (You will not specify frameworks or tools at this stage - describe what will happen to data, not how you will implement it.)
  • (1 points) Discuss training data more specifically, including plans for candidate selection (4.7.2 Candidate selection) and avoiding data leakage (4.7.5 Splitting and leakage)

DevOps/Platform team member (4-person teams only) (2/5 points):

  • (1 points) Describe freshness requirements for models (how frequently, and under what circumstances, should they be retrained?) with justification, and how this will fit into your proposed automation lifecycle.
  • (1 points) Describe scaling requirements for the deployment (e.g. what is peak usage, what is typical usage, how will you “right size”).

Policy on AI use

The ML Systems Design and Operations course focuses on

  • designing systems, by identifying requirements and evaluating tradeoffs
  • and then operationalizing those designs

The cognitive work in this course is not writing code or configurations; it is making correct decisions, understanding trade-offs, defending decisions, explaining the system to stakeholders, and diagnosing failures (i.e. “making it work”). So, you are permitted to use LLMs to help write code and configs, but only as an implementation tool to help realize your design, not as designers.

What that means in practice for your course project is:

You own the design. You (the human) must develop the design yourself. You’ll be asked to defend your design choices, answer “what if” questions about changing requirements, and discuss tradeoffs. If you haven’t thoughtly deeply about the problem and thought through all the possibilities, you’ll struggle to do that.

LLMs may help implement your design. You can ask an LLM to help you write or modify code and configs, with the following constraints:

  1. Start from the provided labs when possible. Wherever possible, you should use the lab assignments as a starting point for code or configs (like a human would!), and build on that rather than starting from scratch. (Of course, if you are implementing something we didn’t do in the lab, you’ll do it from scratch.) This is practical (you avoid having to debug problems that I’ve already solved when developing the lab!) and it’s also realistic (in most settings, you will be modifying existing pipelines and systems, not starting a greenfield design from scratch).
  2. You specify; the LLM executes. You tell the LLM what to do, based on the design you developed.
  3. You must understand what it produced. You are responsible for being able to explain any code or configuration that appears in your project, including what it does and why it is needed for your design.
  4. No silent design changes. Do not allow the LLM to change configurations, parameters, or pipeline structure without your explicit decision and justification. (This is something I have noticed they tend to do when implementing ML systems.)
  5. Disclosure is required. Any commit that includes LLM-generated or LLM-modified code or configuration must include a lightweight disclosure (e.g. Assisted by Codex 5.2 or equivalent).

Communication is human-only. All lab reports, project reports, project documentation, and slides must be written by you without AI assistance. This is because communicating your design is a core learning objective of this class. Only direct translation of your own writing into English (e.g., using Google Translate) is allowed.

Running systems matter, code itself doesn’t. In industry, the availability of LLMs has not made ML engineering work substantially easier. Instead, it has shifted how effort is spent: less time writing artifacts (code and configurations) line-by-line from scratch, and more time specifying intent, directing tools, reviewing generated code and configurations, making corrections, and ensuring that systems are correct, robust, and operational. To the extent that this process is sometimes faster, expectations around productivity have simply increased.

In this class, similarly, expectations around outcomes must be aligned with what people can do with LLM assistance. In the past, producing plausible but non-operational code or configurations could serve as evidence of partial understanding of the course material. Today, that is no longer the case, because generating artifacts that look reasonable but do not run requires no expertise. Therefore, these artifacts cannot earn any credit.

What matters in this course is not the ability to produce text or code, but the ability to design, justify, and operate a real ML system. So, in this project, you are graded on:

  • making sound system design choices (that are aligned with business reuqirements)
  • justifying those choices and trade-offs using course concepts
  • realizing those choices in operational systems running on the course infrastructure

There is no credit for systems that are not running on Chameleon Cloud. Code or configuration that has not been executed in the target environment, or that only runs locally, does not count. Producing text is easy; making a real system run is the work.