Machine Learning Systems Engineering and Operations

Machine Learning Systems Engineering and Operations Notes

This book follows the course overview for Machine Learning Systems Engineering and Operations, and provides lecture notes per week/topic.

Table of contents

  1. Machine learning systems
  2. Cloud computing
  3. ML operations (MLOps)
  4. Data systems
  5. Large model training
  6. Infrastructure and platforms for training
  7. Model serving
  8. Evaluation and monitoring
  9. Safeguarding
  10. Using commercial clouds
  11. GenAI and LLMOps
  12. RAG
  13. Agents and MCP