hero

Search open roles at our portfolio companies

192
companies
1,372
Jobs

Site Reliability Engineer

Sailthru

Sailthru

Software Engineering
Multiple locations
Posted on Jun 14, 2024

Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold’s products address the Messaging, Loyalty and Experiences marketing offerings, to a customer base that is categorized in three segments: Enterprise, Professional and Commercial. Marigold provides the most comprehensive set of use cases for Marketers at any level. Headquartered in Nashville, TN, Marigold has offices globally across the United States, Europe, Australia, New Zealand, Malaysia, India, South America and Central America, as well as in Japan.

Site Reliability Engineers are an integral part of our engineering organisation, working closely with Product, Security, and Operations Teams to deliver high quality customer experiences and allowing our systems to scale for ever-increasing growth.

Campaign Monitor is seeking a Site Reliability Engineer to join our growing SRE team. The ideal candidate is a gifted problem solver able to work in unfamiliar codebases, and an engineer with experience diagnosing system-wide issues. You should be comfortable integrating metrics in code and setting up monitoring that provides rapid detection of degraded performance.

We send over 2 billion emails every month and our infrastructure needs to scale accordingly and deliver the best possible user experience.

You will work with languages such as C#, Java, and Go, whilst implementing new features and changes in the following types of systems:

  • Event-driven microservices and APIs in a distributed architecture

  • Dynamic web applications (ReactJS, ASP.NET Core, Java)

  • Infrastructure development on AWS (EC2, ECS, EMR, SNS/SQS, RDS, Elasticache) using Terraform

  • Event streaming & big data solutions (Kafka, Spark, Airflow)

What you’ll do:

  • Solve problems relating to core services and build automation to prevent problem recurrence, with the goal of automating response to all non-exceptional service conditions.

  • Facilitate root cause analysis sessions and communicate findings back to engineering teams.

  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Campaign Monitor's services.

  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems.

  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.

  • Deploy and monitor servers in multiple data centres and the cloud.

  • Automate repetitive tasks required to maintain a secure and up-to-date operational environment.

  • Develop, improve, and maintain infrastructure management.

  • Be available to handle and resolve issues escalated from the production environment as part of an on-call roster outside of Sydney business hours.

  • Measure everything. Report on interesting events and alert on critical issues.

  • Create and update documentation.

  • Work with other teams to build, test and roll out systems that are resilient, robust, and scalable.

About You

Essential

  • BA/BS degree in Computer Science or a related field (In lieu of degree, 5+ years of relevant industry experience).

  • Strong fluency in C# and strong scripting skills. Knowledge of other programming languages such as Java or Go is a bonus.

  • Understanding of distributed systems architecture and best practices in distributed system design.

  • Commercial hands-on experience with AWS

  • You’ve used a range of storage engines (SQL, Elasticsearch, Cassandra) and know when each type is useful.

  • You know how to use DevTools or similar to improve web application performance.

  • Effective communication skills, via interactive mediums and documentation.

  • Knowledge and experience containerising applications using Docker and deploying to AWS ECS.

Desirable

  • 5+ years experience as a SRE/Platform software engineer.

  • Experience with big data systems such as Elasticsearch, Cassandra or Spark.

  • You know how web applications work, from the underlying network protocols (HTTP, TCP) through to web server (IIS, nginx), browser behaviour and everything in between.

  • Hands-on experience working with and administering self managed PostgreSQL databases.