Senior Site Reliability Engineer, IT
Zipline
This job is no longer accepting applications
See open jobs at Zipline.See open jobs similar to "Senior Site Reliability Engineer, IT" Lerer Hippeau.About Zipline
What You'll Do
As a Senior Site Reliability Engineer, you will play a crucial role in maintaining and improving our systems and infrastructure. You will work closely with cross-functional teams, including both software, hardware, operations teams, to design, implement, and optimize our systems for high availability, fault tolerance, and scalability. You will also be responsible for proactively identifying potential issues and bottlenecks, driving incident response and post-incident analysis, and implementing automation and monitoring solutions.
Responsibilities:
- Design, develop, and maintain highly reliable and scalable systems and infrastructure.
- Collaborate with software engineering and DevOps teams to ensure the smooth integration and deployment of applications and services.
- Implement and improve monitoring, alerting, and observability solutions to proactively identify and resolve potential issues.
- Automate infrastructure provisioning, configuration management, and deployment processes using modern tools and technologies.
- Conduct system performance analysis and optimization to ensure efficient resource utilization and optimal response times.
- Participate in incident response and resolution, conducting post-incident analysis and implementing preventive measures.
- Continuously evaluate and implement best practices and industry standards in site reliability engineering.
- Mentor and provide technical guidance to junior members of the team, fostering a culture of continuous learning and improvement.
- Collaborate with cross-functional teams to define and refine system requirements, capacity planning, and disaster recovery strategies.
- Work with the on-site operations team to ensure proper management, maintenance, and scaling of hardware components.
- Coordinate and perform occasional international travel to support deployments, infrastructure setup, and collaborate with remote teams.
What You'll Bring
- 9+ years of experience in a similar role, with a proven track record of designing, implementing, and managing highly available and scalable systems.
- Deep understanding of Linux/Unix systems administration and experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes).
- Proficiency in at least one programming language (e.g., Python, Go, Java) and experience with infrastructure-as-code tools (e.g., Terraform, Ansible, AWS CDK).
- Strong knowledge of networking principles, including TCP/IP, DNS, load balancing, and firewalls.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) and incident management systems.
- Solid understanding of distributed systems, microservices architecture, and cloud-native application development.
- Experience dealing with on-site operations teams, such as coordinating with the team for proper management, maintenance, and scaling of hardware components.
- Strong troubleshooting and problem-solving skills, with the ability to analyze complex systems and identify performance bottlenecks.
- Excellent communication skills, both written and verbal, with the ability to effectively collaborate with cross-functional teams.
- Willingness to travel internationally occasionally to support deployments and collaborate with remote teams.
- Experience with both production and internal tooling environments is desired.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator) are a plus. Experience working with SSO Providers (e.g., Okta, ADFS, etc) a plus
What Else You Need to Know
The starting cash range for this role is $160,000 - $200,000. Please note that this is a target, starting cash range for a candidate who meets the minimum qualifications for this role. The final cash pay for this role will depend on a variety of factors, including a specific candidate's experience, qualifications, skills, working location, and projected impact. The total compensation package for this role may also include: equity compensation; discretionary annual or performance bonuses; sales incentives; benefits such as medical, dental and vision insurance; paid time off; and more.
This job is no longer accepting applications
See open jobs at Zipline.See open jobs similar to "Senior Site Reliability Engineer, IT" Lerer Hippeau.