Junior+ Site Reliability Engineer

Our client is hiring a Junior+ Site Reliability Engineer to support the reliability, monitoring, and incident response of high-availability distributed platforms through hands-on production work in a shift-based environment.

Role Overview:

Our client is expanding the engineering team responsible for ensuring the stability and predictable behaviour of their distributed services and platforms. This role involves hands-on production work, including monitoring, incident response, troubleshooting, and continuous improvements that increase platform reliability over time.

You will work as part of an SRE shift rotation covering late-evening and night hours, ensuring end-to-end ownership of incidents — from identifying user impact to post-incident follow-ups and preventive improvements.

Key Responsibilities:

Working in shift-based operations: monitoring, alert response, incident handling, escalation when needed;
Participating in incident handling: initial classification, technical investigation, coordination with engineering teams, and following-up improvements;
Developing and refining observability across platforms (metrics/alerts, dashboards, logs);
Reducing operational toil: small automation, runbooks, and repeatable processes (the “make it easier next time” mindset);
Collaborating with development teams to improve production readiness (basic reliability practices, cleaner incident follow-ups).

Required Skills & Experience:

Core skills:

Good Linux skills in production environments (debugging basics, system services, logs, performance basics);
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancing basics, TLS fundamentals);
Experience with containers and image lifecycle basics (Docker or compatible runtimes);
Ability to troubleshoot across application, network, and infrastructure layers using logs/metrics and simple tools (curl, basic traffic/log analysis; scripting is a plus);
Basic familiarity with observability: metrics and alerting, dashboards, logging (any modern stack is fine).

Experience:

1+ year in a production-focused role (Ops / Support L2+ / DevOps / Junior SRE — what matters is real production exposure);
Participation in production incidents (triage, investigation, escalation, basic follow-ups);
Availability to cover late-evening and night shifts, in rotation.

SRE fundamentals (basic understanding):

You understand the difference between “just running infra” and SRE as a discipline: reliability targets, fast detection, clear escalation, and consistent follow-up;
You’re familiar with SLI/SLO and can explain them in simple words (high-level understanding is enough).

What will be an advantage:

Familiarity with Kubernetes (deep production ownership is not required yet);
Exposure to AWS services such as EC2, ALB/NLB, RDS, S3, and IAM basics;
Exposure to Terraform and/or Ansible (small changes, basic understanding of principles);
Experience working in high-availability environments where downtime actually matters.

The company guarantees you the following benefits:

Global Collaboration: Join an international team where everyone treats each other with respect and moves towards the same goal;
Autonomy and Responsibility: Enjoy the freedom and responsibility to make decisions without the need for constant supervision;
Competitive Compensation: Receive competitive salaries reflective of your expertise and knowledge as our partner seeks top performers;
Remote Work Opportunities: Embrace the flexibility of fully remote work, with the option to visit company offices that align with your current location;
Paid Time Off: Prioritise work-life balance with paid vacation and sick leave days to prevent burnout;
Career Development: Access continuous learning and career development opportunities to enhance your professional growth;
Corporate Culture: Experience a vibrant corporate atmosphere with exciting parties and team-building events throughout the year;
Referral Bonuses: Refer talented friends and receive a bonus after they successfully complete their probation period;
Medical Insurance Support: Choose the right private medical insurance and receive compensation (full or partial) based on the cost;
Flexible Benefits: Customise your compensation by selecting activities or expenses you'd like the company to cover, such as a gym subscription, language courses, Netflix subscription, spa days, and more;
Education Foundation: Participate in a biannual raffle for a chance to learn something new unrelated to your job as part of your commitment to ongoing education.

Interview process:

A 30-minute interview with a Recruiter to get to know you and your experience;
1st stage of technical interview (1 h) with the DevOps team to assess your theoretical skills;
2nd stage of technical interview (1 h) with the DevOps team to assess your hard skills;
A final interview to gauge your fit with the company culture and working style.