DevOps \ Site Reliability Engineer

Singapore

Engineering

About Solidus Labs

At Solidus, we are shaping the financial markets of tomorrow by providing cutting-edge trade surveillance technology that protects investors, enhances transparency, and ensures regulatory compliance across traditional financial assets and crypto markets.

With over 20 years of experience in developing Wall Street-grade FinTech, our team delivers innovative solutions that financial institutions and regulators worldwide rely on to detect, investigate, and report market manipulation, financial crime, and fraud. Headquartered in Wall Street, with offices in Singapore, Tel Aviv, and London, we safeguard millions of retail and institutional entities globally, monitoring over a trillion events each day.

Position Overview

We are seeking an experienced Singapore-based DevOps / Site Reliability Engineer to join our DevOps team and own the reliability, stability, and operational support of our production systems.

This role focuses on production ownership, monitoring, incident response, and on-call support, providing critical coverage. You will work with a modern cloud-native stack and play a key role in keeping systems highly available, secure, and performant.

Day to Day:

Own the reliability, availability, and performance of our production environments.
Lead incident response end-to-end, including troubleshooting, mitigation, and resolution.
Perform deep-dive RCA to drive long-term corrective and preventive actions.
Operate production Kubernetes (EKS), including cluster upgrades and Helm deployments.
Manage scaling and capacity using KEDA, Karpenter, and HPA for resource optimization.
Evolve infrastructure as code using Terraform and Helm with security best practices.
Support GitLab CI/CD pipelines, resolving deployment issues and improving stability.
Design observability systems using Prometheus, Grafana, and EFK to reduce alert fatigue.
Solve networking issues involving TLS, Load Balancing, VPCs, NAT, and VPN.
Support compliance initiatives and respond to security-related incidents.
Leverage AI-powered tools as a standard part of your workflow for automation and productivity.
Participate in on-call rotations to provide consistent operational coverage.

Minimum Qualifications

3+ years of hands-on DevOps / SRE experience
Strong production experience with Docker and Kubernetes
Solid knowledge of AWS (EKS, EC2, IAM, RDS, S3, CloudWatch, Lambda)
Experience with monitoring, logging, and alerting systems
Proficiency with Terraform, Helm, and GitLab CI
Strong troubleshooting skills across infrastructure, CI/CD, and networking
Scripting experience with Bash and Python
Fluent English and willingness to participate in on-call rotations
Familiarity with pub/sub systems (SQS, RabbitMQ, Kafka, or similar)

Nice to Have

Experience with Redis, Airflow, Databricks, Spark/EMR
GitOps workflows and advanced Git usage
Experience supporting databases such as Postgres, Snowflake, or ClickHouse
Proficiency in Mandarin

Why Join Us?

Join a team where you’ll own and improve the reliability of critical production systems end to end, with real autonomy and impact, directly supporting premier clients globally. You’ll work on a modern, cloud-native stack operating at scale, tackling meaningful performance and resilience challenges. And you’ll do it alongside a highly collaborative, global DevOps and R&D team—sharing standards, tooling, and operational expertise across regions.

Hybrid position, Singapore based.

DevOps \ Site Reliability Engineer

Apply for this position

Solutions

Company

Resources

Legal