Site Reliability Engineer

Site Reliability Engineer / DevOps Engineer (Azure)



Opportunity to join one of the top UK Insurers who are on a mission to become the leading ‘digital first' insurer in the UK.

As a Site Reliability Engineer, you will be the backbone of their Azure environment, ensuring it's

*

*scalability, reliability, and operational excellence

*

*.

You will work closely with cross-functional teams to build and maintain a robust infrastructure that supports their dynamic needs.



Key Responsibilities:



  • Assume responsibility for the observability suite, encompassing tools for monitoring, logging, and alerting, to guarantee a thorough and integrated understanding of system functionality and health.

  • Set up and oversee APM tools like Dynatrace or New Relic, leveraging their features to effectively monitor application performance and resolve problems.

  • Employ extensive DevOps expertise to establish and uphold infrastructure as code (IaC) methodologies, streamlining the processes of deployment, scaling, and management through automation.

  • Actively track and pinpoint issues related to performance and reliability in APIs and applications, and devise strategies to address these concerns.

  • Work in tandem with development teams to fine-tune application performance, enhance the efficiency of resource use, and improve scalability.

  • Develop and sustain comprehensive incident response and review protocols to reduce system downtime and avert the repetition of problems.

  • Propel ongoing enhancement efforts to boost the dependability, scalability, and operational efficiency of Ageas' infrastructure and services, staying ahead of client expectations.

  • Engage in the on-call schedule, offering support for resolving incidents and conducting necessary troubleshooting.



Qualifications:



  • Experience in a DevOps / Site Reliability Engineer ( SRE ) position, dedicated to ensuring the high availability, reliability, and scalability of live systems.

  • Proficient in observability tools like Prometheus, ELK stack, Grafana, and Azure Monitor, capable of fully managing the suite for optimal system oversight.

  • Skilled in operating APM tools such as Dynatrace or New Relic, with a track record of using these tools to effectively monitor and enhance application performance.

  • A thorough grasp of DevOps methodologies, including the use of Terraform for infrastructure as code (IaC), and expertise in automated deployment and configuration management.

  • Hands-on experience with programming environments such as Node.js, Java, and various JavaScript frameworks.

  • Familiarity with cloud platforms, especially Azure, and adept at administering cloud-based infrastructures.

  • Demonstrated ability to anticipate and rectify issues impacting the performance and reliability of APIs and applications.

  • Excellent teamwork and communication abilities, ensuring productive collaboration with diverse functional groups.



Remote based.


Paying up to 75k, depending on experience.




Share Job