Site Reliability Engineer

A Site Reliability Engineer has opened up at one of our global banking clients.

A leading financial services organisation is seeking a Site Reliability Engineer to join their growing technology team in Taguig. This is an exceptional opportunity for you to play a pivotal role in ensuring the reliability, scalability, and performance of critical systems that underpin digital banking operations. You will be welcomed into a supportive environment where your expertise will be valued, and your professional growth will be encouraged through ongoing training opportunities and exposure to cutting-edge technologies. The organisation is committed to fostering an inclusive workplace culture, offering flexible working arrangements and a strong sense of community among colleagues. If you are passionate about building dependable systems and thrive in environments where collaboration and knowledge sharing are at the heart of success, this could be the perfect next step in your career.

Join a highly collaborative team dedicated to maintaining the stability and efficiency of essential digital banking platforms, where your contributions will have a direct impact on millions of users.
Benefit from flexible working opportunities and continuous training programmes designed to support your personal and professional development within a nurturing environment.
Be part of an inclusive workplace that values diverse perspectives, encourages open communication, and prioritises the well-being of every team member.

What you'll do:

As a Site Reliability Engineer based in Taguig, you will play a crucial role in safeguarding the performance and resilience of digital banking services relied upon by customers every day. Your responsibilities will span proactive monitoring of complex systems, collaborating with cross-functional teams to implement robust solutions, automating key operational tasks for efficiency gains, and responding swiftly to incidents when they arise. You will also contribute significantly to knowledge sharing through detailed documentation and active participation in post-incident reviews. By supporting capacity planning efforts and upholding rigorous standards for security and compliance, you will help ensure that the organisation’s technology infrastructure remains dependable as it evolves. Success in this position requires not only technical proficiency but also a commitment to teamwork, open communication, and continuous learning within a supportive environment.

Monitor system health and performance using advanced observability tools, proactively identifying potential issues before they affect end users.
Collaborate closely with software engineers, infrastructure teams, and business stakeholders to design robust solutions that enhance system reliability and availability.
Automate operational processes by developing scripts and tools that streamline deployment, monitoring, and incident response workflows.
Respond promptly to incidents by troubleshooting complex technical problems, coordinating with relevant teams, and implementing effective resolutions to minimise downtime.
Participate in post-incident reviews to identify root causes, share learnings across teams, and drive continuous improvement initiatives for future prevention.
Maintain comprehensive documentation for system configurations, procedures, and best practices to ensure knowledge is shared across the team.
Contribute to capacity planning efforts by analysing usage patterns and forecasting future resource requirements based on business growth projections.
Support change management processes by evaluating risks associated with new deployments or updates, ensuring smooth transitions with minimal disruption.
Champion best practices in security, compliance, and data protection throughout all aspects of site reliability engineering activities.

What you bring:

To excel as a Site Reliability Engineer in this organisation’s Taguig office, you will bring proven experience operating mission-critical systems at scale—ideally within financial services or similarly regulated sectors. Your background should include hands-on work with cloud-based architectures as well as automation frameworks that drive operational excellence. A keen eye for detail combined with methodical problem-solving abilities will enable you to address challenges efficiently while minimising risk. Your approach should emphasise empathy towards colleagues’ perspectives during incident resolution or process improvements. The ideal candidate thrives when sharing knowledge openly with others—whether through documentation or mentoring—and demonstrates adaptability when faced with evolving technologies or shifting priorities. Above all else, your dependability ensures that both internal teams and external customers can trust the stability of essential services.

Demonstrated experience managing large-scale distributed systems within cloud or hybrid environments using modern observability tools such as Prometheus or Grafana.
Proficiency in scripting languages like Python or Bash for automating operational tasks and improving workflow efficiency.
Solid understanding of containerisation technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for deploying scalable applications.
Familiarity with configuration management tools such as Ansible or Terraform for infrastructure automation.
Strong troubleshooting skills with the ability to diagnose complex issues under pressure while maintaining clear communication with stakeholders.
Experience participating in incident response processes including root cause analysis and post-mortem documentation.
Knowledge of best practices related to security, compliance standards (such as ISO 27001), and data protection within regulated industries.
Excellent interpersonal skills enabling effective collaboration across multidisciplinary teams in fast-evolving environments.
Commitment to continuous learning demonstrated through engagement with training opportunities or industry certifications.

What sets this company apart:

This organisation stands out for its unwavering commitment to employee well-being, professional development, and inclusivity. Team members benefit from flexible working arrangements that accommodate diverse lifestyles while promoting work-life balance. Comprehensive training programmes empower individuals at every stage of their careers—whether deepening technical expertise or exploring new areas of interest within technology. The company fosters a culture where collaboration is celebrated; ideas are freely exchanged across departments; and everyone’s voice is heard regardless of background or tenure. Employees enjoy access to state-of-the-art resources designed to support innovation without compromising on security or compliance standards vital in financial services. With a focus on nurturing talent from under-represented groups alongside established professionals alike, this employer offers an environment where you can truly grow both personally and professionally while making meaningful contributions every day.

What's next:

If you are ready to take your career forward as a Site Reliability Engineer within an inclusive team that values your expertise and supports your ambitions, we encourage you to apply now!

Apply today by clicking on the link provided—your next rewarding opportunity awaits.

Due to the high volume of applications we are experiencing, our team will only be in touch with you if your application is shortlisted.

Similar jobs

View more jobs

Site Reliability Engineer

Share

Similar jobs