Skip To Main Content
backgo to search

senior site reliability engineer

bullets
Site Reliability Engineering, Amazon Web Services, Bare metal, Grafana, Linux, Prometheus, UNIX shell scripting, Docker, Kubernetes, Python, Terraform, DevOps
warning.png
Sorry the job is no longer available.

We are looking for a Senior Site Reliability Engineer to join our remote team.

In this position, you will be responsible for the day-to-day operations of the massively scalable and highly available backend platform. You will also be running the hybrid infrastructure, automating routine operational tasks, and ensuring the smooth functioning of our production services.

responsibilities
  • Ensure the smooth functioning of our production services, including day-to-day operations of the backend platform
  • Run and maintain the hybrid infrastructure, automating routine operational tasks
  • Monitor production services using Prometheus/InfluxDB, ELK, Grafana, and OpsGenie/PagerDuty
  • Troubleshoot software/hardware issues and deep dive to find out why servers are performing at a sub-par level
  • Manage production incidents and work with stakeholders to resolve issues and minimize the impact
  • Create and maintain comprehensive documentation of all operational procedures and processes
  • Collaborate with development teams to design scalable and reliable systems
  • Implement best practices for security and compliance
requirements
  • Minimum of 3 years of experience in Linux system administration, preferably on Ubuntu
  • Minimum of 3 years of experience in production monitoring using Prometheus/InfluxDB, ELK, Grafana, and OpsGenie/PagerDuty
  • Experience in building golden images, troubleshooting software/hardware issues
  • Ability to deep dive to find out why servers are performing at a sub-par level
  • Proficiency in Python, shell scripting, or Ansible
  • Experience working on geo-distributed and highly available production services
  • Strong knowledge of IPv4 and IPv6
  • Familiarity with data center operations
  • Hands-on experience in monitoring and debugging
  • Strong networking troubleshooting skills
  • Fluent verbal and written communication skills in English (B2+ level)
nice to have
  • Experience with Incident Management and SLA/SLO/SLI
  • Performance tuning experience
  • Proficiency in Terraform
  • Experience with Kubernetes/Docker
  • Familiarity with DevOps best practices

These jobs are for you

benefits for locations

colombia.svg
For you
  • Prepaid Medicine with Colsanitas for you and your legal dependents 
  • MetLife Life Insurance for you 
  • Thousands of projects for top brands
  • Stable income
For your comfortable work
  • 100% remote work forever
  • Free licensed software
  • Possibility to work on your own device (BYOD)
  • Stable workload
  • Flexible engagement models
For your growth
  • Free trainings for technical and soft skills
  • Free access to LinkedIn Learning platform
  • Support from a personal Skill Advisor
  • Language courses
  • Free access to internal and external e-Libraries
  • Access to internal communities and competency centers
  • Certification opportunities
get job alerts in your inboxHundreds of open jobs for Software Engineers, QA, DevOps, Business Analysts and other tech professionals
a smiling man wearing sunglasses