Skip To Main Content
backgo to search

senior site reliability engineer

Site Reliability Engineering, Amazon Web Services, Bare metal, Grafana, Linux, Prometheus, UNIX shell scripting, Docker, Kubernetes, Python, Terraform
Sorry the job is no longer available.

We are seeking a Senior Site Reliability Engineer to join our remote team.

As an SRE, you will be working closely with our distributed team, which runs our production operations. You will be responsible for the day-to-day operations of our massively scalable and highly available backend platform, running our hybrid infrastructure, automating routine operational tasks, leading individual projects, and achievements with abundant communication of progress.

  • Collaborate with teams to design, build, and maintain highly available and scalable infrastructure
  • Ensure the reliability and uptime of our services and applications
  • Automate routine operational tasks to improve efficiency and productivity
  • Lead individual projects and achievements with abundant communication of progress
  • Troubleshoot software or hardware issues, build golden images, and deep dive to find out why servers are performing at a sub-par level
  • Analyze why services and sites are not working or blocked, and understand the workings of IPv4 and IPv6
  • Troubleshoot network issues such as speeds not performing at optimal levels
  • Effectively communicate with teams to troubleshoot, debug, and resolve issues in both production and non-production environments
  • A minimum of 3 years of experience in Site Reliability Engineering
  • Strong experience with Amazon Web Services (AWS) and Bare metal
  • Proficiency in Grafana, Prometheus, and UNIX shell scripting
  • Demonstrable experience with geo-distributed and highly available production services
  • Strong networking troubleshooting skills and proven experience in Linux system administration
  • Familiarity with data center operations
  • Fluent verbal and written communication skills in English (B2 level)
nice to have
  • Familiarity with Incident management and SLA/SLO/SLI
  • Experience in performance tuning
  • Knowledge of Kubernetes, Docker
  • Expertise in Python and Terraform

These jobs are for you

benefits for locations

For you
  • Prepaid Medicine with Colsanitas for you and your legal dependents 
  • MetLife Life Insurance for you 
  • Thousands of projects for top brands
  • Stable income
For your comfortable work
  • 100% remote work forever
  • Free licensed software
  • Possibility to work on your own device (BYOD)
  • Stable workload
  • Flexible engagement models
For your growth
  • Free trainings for technical and soft skills
  • Free access to LinkedIn Learning platform
  • Support from a personal Skill Advisor
  • Language courses
  • Free access to internal and external e-Libraries
  • Access to internal communities and competency centers
  • Certification opportunities
get job alerts in your inboxHundreds of open jobs for Software Engineers, QA, DevOps, Business Analysts and other tech professionals
a smiling man wearing sunglasses