Site Reliability Engineer
As a Site Reliability Engineer at Red Hat, you will be responsible for ensuring the reliability, scalability, and performance of production services and cloud-based platforms. You will design and implement monitoring, alerting, and incident response systems to maintain high availability. Your work will involve automating operational tasks, building self-healing infrastructure, and developing tools to improve system observability. You will collaborate with development teams to embed reliability practices into the software development lifecycle, conduct root cause analysis of incidents, and drive improvements to reduce downtime. The role requires strong skills in systems administration, scripting, and cloud infrastructure management.