DevOps / SRE Manager
- Job Ref: 6060
- Location: Dublin, Ireland
- Type: Permanent
We are looking for an experienced DevOps/Site Reliability Engineering Manager who will bootstrap, grow and lead a new SRE team, defining and setting up processes, ways of working, logging, alerting, on-call rotation, metrics and KPIs for one of our customers private PaaS cloud with an ever-watchful eye on its availability, latency, performance, and capacity.
Our teams are part of an expanding global-scale innovation project, creating automation services to speed up R&D work for tens of thousands of engineers.
We are searching for a person with both managerial and hands-on approach and willingness to contribute to all required tasks. Skilful, communicative and dynamic person, someone who takes ownership of their deliverables, experienced working with agile software development, multiple stakeholders at various levels and ideally with previous experience with complex and large-scale systems.
- Hands-on technical experience.
- 5+ years of mixed professional experience as a Software Engineer, DevOps Engineer or Site Reliability Engineer
- Proven expertise in recruiting and managing a team of enthusiastic, experienced engineers on large scale projects.
- Previously setting up on-call, escalation, monitoring and alert systems
- Capable of technical deep-dives into code, networking, operating systems and storage, yet verbally and cognitively agile enough to hold your own in a strategy discussion with customer’s architects and engineers
- Deploying, managing and optimizing container orchestration using Kubernetes, Helm
- Monitoring and alerting technologies: Grafana, Prometheus, Zabbix, Graphite.
- ELK stack knowledge
- Experience in one or more of C, C++, Java/Kotlin, Go, Python
- Scripting experience: Shell, Perl and/or Python.
- CI/CD pipelines management and associated tools: Jenkins, Spinnaker, GitLab CI
- Source control tools: Git, GitLab, Gerrit.
- Experience working in Agile Scrum teams
- Proficiency in algorithms, data structures, complexity analysis and software design and/or expertise in Unix/Linux systems, IP networking, performance and application issues.
- Expertise in problem solving and analyzing global scale distributed systems.
- Effective management and communication skills.
- MySQL, MariaDB, PostgreSQL, Cassandra, Redis, Galera Cluster previous experience
- RabbitMq/Kafka/ActiveMq knowledge
- Experience with public clouds such as AWS/Azure/GCP
- Infrastructure-as-code methodologies such as Terraform.