Middle Site Reliability Engineer (SRE)

Our partner is seeking a Middle Site Reliability Engineer to join their high-stakes iGaming platform, ensuring rock-solid uptime, blazing performance, and a secure infrastructure for thousands of real-time bets across casino games, sports, and payments.

Role Overview:

Our client is expanding the team responsible for ensuring the reliability and stability of their production systems across multiple platforms. In this role, you will work directly with live environments - monitoring, responding to incidents, and improving observability and operational processes.

This position is suited for engineers with real production exposure who understand that SRE is more than running infrastructure - it’s about reliable services, fast detection, effective response, and continuous improvement. A high-level understanding of SLI/SLO is expected.

You will work in a shift-based setup, including late-evening and night rotations, taking ownership of incidents from detection to resolution and contributing to making systems more stable over time.

Key responsibilities:

Working in shift-based operations: monitoring, alert response, incident handling, escalation when needed;
Participating in incident handling: initial classification, technical investigation, coordination with engineering/development teams, and following-up improvements;
Developing and refining observability across platforms (metrics/alerts, dashboards, logs);
Reducing operational toil: small automation, runbooks, and repeatable processes (the “make it easier next time” mindset);
Working with documentation set in the Atlassian ecosystem. This will include writing/updating KB, Runbooks, and other technical documentation;
Collaborating with development teams to improve production readiness (basic reliability practices, cleaner incident follow-ups).

Ideal profile for the position:

Core skills:

Good Linux skills in production environments (debugging basics, system services, logs, performance basics);
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancing basics, TLS fundamentals);
Experience with containers and image lifecycle basics (Docker or compatible runtimes);
Ability to troubleshoot across application, network, and infrastructure layers using logs/metrics and basic tools (curl, basic traffic/log analysis; scripting is a plus);
Basic familiarity with observability: metrics and alerting, dashboards, logging.

SRE fundamentals:

You understand the difference between “just running infra” and SRE as a discipline: reliability targets, fast detection, clear escalation, and consistent follow-up;
You’re familiar with SLI/SLO and can explain them in simple terms (a high-level understanding is enough).

Experience:

1+ year in a production-focused role (Ops/Support L2+/DevOps/Junior SRE — what matters is real production exposure);
Participation in production incidents (triage, investigation, escalation, basic follow-ups);
Availability to cover late-evening and night shifts, in rotation.

What will be an advantage:

Familiarity with Kubernetes (they don’t require deep production ownership yet);
Exposure to AWS services such as EC2, ALB/NLB, RDS, S3, and IAM basics;
Exposure to Terraform and/or Ansible (small changes, basic understanding of principles);
Experience working in high-availability environments where downtime actually matters.

The company guarantees you the following benefits:

Global Collaboration: Join an international team where everyone treats each other with respect and moves towards the same goal;
Autonomy and Responsibility: Enjoy the freedom and responsibility to make decisions without the need for constant supervision;
Competitive Compensation: Receive competitive salaries reflective of your expertise and knowledge as our partner seeks top performers;
Remote Work Opportunities: Embrace the flexibility of fully remote work, with the option to visit company offices that align with your current location;
Paid Time Off: Prioritise work-life balance with paid vacation and sick leave days to prevent burnout;
Career Development: Access continuous learning and career development opportunities to enhance your professional growth;
Corporate Culture: Experience a vibrant corporate atmosphere with exciting parties and team-building events throughout the year;
Referral Bonuses: Refer talented friends and receive a bonus after they successfully complete their probation period;
Medical Insurance Support: Choose the right private medical insurance and receive compensation (full or partial) based on the cost;
Flexible Benefits: Customise your compensation by selecting activities or expenses you'd like the company to cover, such as a gym subscription, language courses, Netflix subscription, spa days, and more;
Education Foundation: Participate in a biannual raffle for a chance to learn something new unrelated to your job as part of your commitment to ongoing education.

Interview process:

A 30-minute interview with a Recruiter to get to know you and your experience;
1st stage of technical interview (1 h) with the DevOps team to assess your theoretical skills;
2nd stage of technical interview (1 h) with the DevOps team to assess your hard skills;
A final 1-hour interview to gauge your fit with the company culture and working style.