Cloud Site Reliability Engineer
Cloud Site Reliability Engineer role in NICE Public Safety team responsible for automating, supporting, and maintaining applications 24/7 to reduce issues and speed up detection/resolution using automation, tooling, telemetry, and data. Key responsibilities include acting as production gatekeeper managing work backlog and developing reliability improvements, leading investigations into root cause outages and performance issues, leading automation initiatives for low-value tasks, providing technical leadership to Cloud Operations and Support teams, developing and configuring monitoring dashboards and alerts in Grafana and Azure Monitor, installing and configuring Observability Platform (Grafana, Prometheus, Azure Monitor, OpenTelemetry), developing bicep modules for monitoring infrastructure, and configuring CI/CD pipelines in Azure DevOps. The role requires 3+ years SRE experience, expertise in databases (MS-SQL, Elasticsearch), programming (C#, PowerShell), infrastructure as code (ARM, BICEP, Git), Kubernetes, and Azure.