Machine Learning Systems Engineering and Operations
Machine Learning Systems Engineering and Operations Notes
This book follows the course overview for Machine Learning Systems Engineering and Operations, and provides lecture notes per week/topic.
Table of contents
- Machine learning systems
- Cloud computing
- ML operations (MLOps)
- Data systems
- Large model training
- Infrastructure and platforms for training
- Model serving
- Evaluation and monitoring
- Safeguarding
- Using commercial clouds
- GenAI and LLMOps
- RAG
- Agents and MCP