About the company:
Whizdom AI is an early-stage AI startup building products around recommendation systems, personalisation, and GenAI agents. We are a small team working directly on real customer problems, shipping quickly, measuring outcomes, and iterating fast. Everyone here is expected to take ownership, improve systems proactively, and help build the engineering foundations the company will scale on.
The Role:
We are hiring a Founding AI Platform Engineer to own the systems that make the company's ML and GenAI products reliable, deployable, observable, and scalable. This role sits at the intersection of backend engineering, infrastructure, MLOps, and product delivery. You will build the production layer around training, evaluation, deployment, serving, CI/CD, experimentation, and monitoring. In a team of the company size, this role spans backend services, infrastructure, tooling, and reliability work. Your job is to make sure promising ML and GenAI capabilities become stable, customer-ready systems.
What You’ll Own:
- Build and maintain the infrastructure and tooling used to train, evaluate, deploy, and monitor ML models and GenAI services;
- Own production services, APIs, and pipelines that power recommendations, agent workflows, and customer-facing integrations;
- Improve CI/CD, testing, release workflows, rollback processes, and environment management;
- Establish observability across service health, model behaviour, agent quality, latency, cost, and failure modes;
- Build reproducibility and lifecycle practices for models, prompts, datasets, configurations, and releases;
- Support experimentation and measurement infrastructure so product and ML changes can be evaluated cleanly;
- Improve reliability, scalability, security, performance, and cost efficiency across the stack;
- Troubleshoot production issues end-to-end and turn recurring pain points into durable engineering improvements;
- Help define the platform and engineering standards the company will rely on as it grows.
What Success Looks Like in the First 6 Months:
- Shipping a model or GenAI change to production becomes faster, safer, and less manual;
- Core services and AI workflows are observable and easier to debug;
- The platform supports more usage with better reliability and lower operational friction;
- Engineers spend less time fighting infrastructure and deployment issues and more time shipping product;
- You become the person who can see platform, reliability, and scaling risks early and address them before they become problems.
What We’re Looking For:
- Strong software engineering background with experience building and operating production systems;
- Experience with backend services, cloud infrastructure, CI/CD, testing, observability, and automation;
- Strong Python skills and comfort working across services, tooling, infrastructure, and operational workflows;
- Good judgment about reliability, performance, maintainability, and cost tradeoffs;
- Ability to collaborate closely with ML and product teams and move ambiguous work to completion;
- High ownership, attention to detail, and a bias toward simplifying and strengthening systems.
Nice to Have:
- Experience with MLOps workflows for model training, evaluation, deployment, and monitoring;
- Experience serving ML models or LLM applications in production;
- Experience with experimentation platforms, event pipelines, analytics instrumentation, or feature delivery platforms;
- Experience with agent evaluation, prompt versioning, retrieval/search infrastructure, or vector-backed systems;
- Experience supporting customer-facing APIs or SaaS platform infrastructure.
A Note on Fit:
You do not need to have every tool on your resume to be a strong fit. We care about engineers who learn quickly, take ownership, and can build reliable systems in the real world.
Why Join:
- Direct access to the founders and real influence over the platform and engineering direction;
- High-ownership role with room to shape the production foundations of the company from the beginning;
- Opportunity to work on the systems behind recommendation engines and GenAI products that customers actually use;
- Flexible remote environment with strong overlap with European time zones preferred;
- Small team, low bureaucracy, and a lot of room to grow.