Site Reliability Engineer, Fleet – REMOTE

Full time @CISCO Meraki in Information Technology (IT)
  • Remote within US View on Map
  • Post Date : May 30, 2025
  • Apply Before : June 16, 2025
  • 0 Application(s)
  • View(s) 2
Email Job

Job Detail

  • Job ID 23999
  • Experience  2 Years
  • Qualifications  Degree Bachelor
Bottom Promo

Job Description

“RESPONSIBILITES
Develop and maintain automation code for cloud maintenance processes using Ansible and Ruby.
Efficiently coordinate and execute large scale maintenance operations acting as a central point between multiple teams
Debug and resolve complex failure scenarios across large-scale systems, ensuring high availability and reliability.
Design, implement, and optimize GitLab CI pipelines to streamline deployment and testing workflows.
Collaborate with engineering teams to identify and address performance bottlenecks and scaling challenges.
Proactively troubleshoot issues across the fleet, using a deep understanding of Linux systems and networking.
Contribute to the creation of robust unit tests and infrastructure testing suites with RSpec.
Participate in collaborative projects to improve infrastructure efficiency, scalability, and observability.
Work cross-functionally with teams in different time zones, fostering a culture of shared ownership and reliability.
Develop and maintain automated tools for collecting infrastructure data to support compliance requirements.
Streamline compliance processes by reducing manual overhead through automation.
Be part of an on-call SRE team responding in real time to production incidents
YOU ARE AN IDEAL CANDIDATE IF YOU:
Experience in:
Working in Linux environments across multiple machines, comfortable with bash scripting
Scripting / programming languages, specifically around automation. Ideally ruby.
CI/CD pipelines, particularly GitLab CI
Infrastructure automation, ideally Ansible.
Cloud infrastructure providers, ideally AWS
Demonstrated experience troubleshooting and debugging in complex distributed systems.
Monitoring and alerting, prometheus, grafana etc
Experience managing and optimizing fleets of thousands of machines.
Excellent collaboration skills and the ability to work effectively across teams in multiple time zones.
Passion for automation, scalability, and infrastructure as code. “

Bottom Promo

Required skills

Other jobs you may like