Left arrow icon
Open Positions

DevOps\ Site Reliability Engineer

Singapore

Engineering

About Solidus Labs

At Solidus, we are shaping the financial markets of tomorrow by providing cutting-edge trade surveillance technology that protects investors, enhances transparency, and ensures regulatory compliance across traditional financial assets and crypto markets. 

With over 20 years of experience in developing Wall Street-grade FinTech, our team delivers innovative solutions that financial institutions and regulators worldwide rely on to detect, investigate, and report market manipulation, financial crime, and fraud. Headquartered in Wall Street, with offices in Singapore, Tel Aviv, and London, we safeguard millions of retail and institutional entities globally, monitoring over a trillion events each day.


Position Overview

We are seeking an experienced Singapore-based DevOps / Site Reliability Engineer to join our DevOps team and own the reliability, stability, and operational support of our production systems.

This role focuses on production ownership, monitoring, incident response, and on-call support, providing critical coverage. You will work with a modern cloud-native stack and play a key role in keeping systems highly available, secure, and performant.


Key Responsibilities

     Site Reliability & Production Ownership

  • Own the reliability, availability, and performance of production environments
  • Lead incident response, including troubleshooting, mitigation, and resolution
  • Perform root cause analysis (RCA) and drive corrective and preventive actions
  • Participate in on-call rotations and provide operational coverage

CI/CD & Automation

  • Troubleshoot and support CI/CD pipelines (GitLab CI) and deployment issues
  • Improve pipeline reliability, stability, and operational efficiency

Kubernetes & Cloud Operations

  • Operate and support production Kubernetes (EKS) environments
  • Manage scaling, availability, and capacity planning (KEDA, Karpenter, HPA)
  • Perform cluster upgrades, Helm deployments, and resource optimization
  • Troubleshoot Kubernetes, AWS, and cloud networking issues

Monitoring, Alerting & Observability

  • Design, maintain, and improve monitoring and alerting systems
  • Build dashboards and actionable alerts using Prometheus, Grafana, and EFK
  • Reduce alert fatigue by tuning thresholds and improving signal quality

Networking & Connectivity

  • Troubleshoot networking issues, including TLS, load balancing, VPCs, and NAT
  • Support and maintain VPN connectivity and network access

Infrastructure, Security & Operations

  • Maintain and evolve infrastructure using Terraform and Helm
  • Improve system resilience, performance, and cost-effectiveness
  • Apply cloud and container security best practices
  • Support operational aspects of compliance initiatives and security incidents

AI-Driven Engineering Practices

  • Leverage AI-powered tools as a standard part of daily work for troubleshooting, automation, and productivity improvements

Minimum Qualifications

  • 3+ years of hands-on DevOps / SRE experience
  • Strong production experience with Docker and Kubernetes
  • Solid knowledge of AWS (EKS, EC2, IAM, RDS, S3, CloudWatch, Lambda)
  • Experience with monitoring, logging, and alerting systems
  • Proficiency with Terraform, Helm, and GitLab CI
  • Strong troubleshooting skills across infrastructure, CI/CD, and networking
  • Scripting experience with Bash and Python
  • Fluent English and willingness to participate in on-call rotations
  • Familiarity with pub/sub systems (SQS, RabbitMQ, Kafka, or similar)


Nice to Have

  • Experience with Redis, Airflow, Databricks, Spark/EMR
  • GitOps workflows and advanced Git usage
  • Experience supporting databases such as Postgres, Snowflake, or ClickHouse
  • Proficiency in Mandarin


Why Join Us?

Join a team where you’ll own and improve the reliability of critical production systems end to end, with real autonomy and impact, directly supporting premier clients globally. You’ll work on a modern, cloud-native stack operating at scale, tackling meaningful performance and resilience challenges. And you’ll do it alongside a highly collaborative, global DevOps and R&D team—sharing standards, tooling, and operational expertise across regions.


Hybrid position, Singapore based.

Apply for this position