Back

Site Reliability Engineer (SRE)

Our partner is seeking a Site Reliability Engineer to join their high-stakes iGaming platform, ensuring rock-solid uptime, blazing performance, and a secure infrastructure for thousands of real-time bets across casino games, sports, and payments.

Role Overview:

Help power a platform that handles thousands of bets per second, across casino tables, sports events, and real-time payments — all while ensuring blazing performance, rock-solid uptime, and bulletproof security. If you thrive under pressure, love automation, and know that every millisecond matters, this is your arena.

As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations to ensure the services and platforms are reliable, scalable, and performant — even under high transaction volumes and regulatory requirements.

You’ll work closely with backend engineers, DevOps, InfoSec, and operational teams to build automation, observability and be the first responders to incidents.

Key Responsibilities:
  • Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments);
  • Manage and support highly available, scalable infrastructure (K8s, cloud and bare metal);
  • Implement and manage monitoring, logging, and alerting (e.g., Prometheus, Grafana, Loki, ELK);
  • Automate deployments and operations using CI/CD pipelines (Jenkins, ArgoCD, Helm, etc.);
  • Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR);
  • Participate in on-call rotation to ensure 24/7 system reliability;
  • Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance);
  • Collaborate with Dev, QA, DevOps and Ops to improve services stability and uptime.
Ideal profile for the position:
  • Experience with AWS or hybrid data center setups;
  • Reading logs and stacktraces to determine the root cause of the incident;
  • Infrastructure as Code: Terraform, Helm, Ansible, (optional) Werf;
  • Linux administration and container orchestration (K8s) skills;
  • Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.;
  • Strong understanding of TCP/IP, DNS and load balancers;
  • Familiarity with incident response, postmortems, and blameless culture;
  • Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.
What will be an advantage:
  • Background in high-throughput environments (e.g., financial, trading, iGaming);
  • Experience with CDNs, and real-time log aggregation;
  • Proficiency in one or more scripting languages (Python, Bash, Go);
  • Knowledge of Java, PHP with their respective web-development frameworks;
  • Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc;
  • Exposure to Kafka, Redis or other event-driven systems.
Success Metrics:
  • < 1% downtime for any user-/partner-facing services;
  • SLO 99.95%;
  • 95% of infrastructure managed via code and automation;
  • Documented runbooks and alert playbooks per service group.
The company guarantees you the following benefits:
  • Global Collaboration: Join an international team where everyone treats each other with respect and moves towards the same goal;
  • Autonomy and Responsibility: Enjoy the freedom and responsibility to make decisions without the need for constant supervision.
  • Competitive Compensation: Receive competitive salaries reflective of your expertise and knowledge as our partner seeks top performers.
  • Remote Work Opportunities: Embrace the flexibility of fully remote work, with the option to visit company offices that align with your current location.
  • Flexible Work Schedule: Focus on performance, not hours, with a flexible work schedule that promotes a results-oriented approach;
  • Unlimited Paid Time Off: Prioritise work-life balance with unlimited paid vacation and sick leave days to prevent burnout;
  • Career Development: Access continuous learning and career development opportunities to enhance your professional growth;
  • Corporate Culture: Experience a vibrant corporate atmosphere with exciting parties and team-building events throughout the year;
  • Referral Bonuses: Refer talented friends and receive a bonus after they successfully complete their probation period;
  • Medical Insurance Support: Choose the right private medical insurance and receive compensation (full or partial) based on the cost;
  • Flexible Benefits: Customise your compensation by selecting activities or expenses you'd like the company to cover, such as a gym subscription, language courses, Netflix subscription, spa days, and more;
  • Education Foundation: Participate in a biannual raffle for a chance to learn something new unrelated to your job as part of your commitment to ongoing education.
Interview process:
  1. HR Interview with the Recruiter;
  2. Technical Interview;
  3. Final interview with the team.
If you find this opportunity right for you, don't hesitate to apply or get in touch with us if you have any questions!

Job Specifications:

Role Occupation

Remote

Location

Remote Tbilisi 🇬🇪 London 🇬🇧 Limassol 🇨🇾 Yerevan 🇦🇲

Role Direction

Software Engineering, IT

Seniority Level

Middle, Senior

Recruiter:

Baia Devsurashvili

Baia Devsurashvili

Recruiter

Contact:

Are you ready for your next challenge?

© 2025 Nextchallenge. All rights reserved.

NEXTCHALLENGE FZCO
Registration number: DSO-FZCO-44204
Address: IFZA Business Park, DDP, 46394-001, Dubai, UAE