Site Reliability Engineer
- Job Ref: 3634
- Location: Dublin, Ireland
- Type: Permanent
What will I be doing?
System Administration & Site Reliability
- Whiteboard a fix to a scaling problem — and then make it happen.
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise.
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems.
- Design and develop new highly-available infrastructure to meet the needs of our growing and evolving product.
- Install new / rebuild existing servers and configure hardware, peripherals, services, settings, directories, storage, etc. in accordance with standards and project/operational requirements.
Operations and Support
- Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
- Perform regular security monitoring to identify intrusion patterns.
- Perform daily backup operations, including restorative testing.
- Resource utilisation monitoring and solution recommendation.
- Manage user provisioning and automated provisioning systems.
- Provide escalation engineering support to other teams.
- Repair and recover from hardware or software failures. Coordinate and communicate with impacted constituencies.
- Assist in applying OS patches and upgrades on a regular basis, and upgrade administrative tools and utilities. Configure / add new services as necessary.
- Contribute to system configuration and asset management applications.
What skills do I need?
- Bachelor (4-year) degree, with a technical major, such as engineering or computer science.
- At least four years production system administration/SRE experience.
- At least two years serving a large-scale SaaS web application solution with AWS services.
- You are able to analyze and optimize performance in high-traffic internet applications.
- Thorough understanding of common Internet protocols (e.g. HTTP, DNS, SMTP).
- Familiarity with APIs used for monitoring, management, user provisioning, and SSO.
- Ability to solve complex, high-impact problems.
- Ability to digest and discuss issues/solutions with team members that may not be familiar with such terminology/technologies
- Excellent communication skills, team player.
Don’t worry if you don’t tick every box in order to apply, we’re always happy to review applications and take all experience into consideration. We do our utmost to provide feedback where we can!
Not required but considered a big plus
- Experience maintaining uptime of a production Ruby/Rails app.
- Use of CI/CD tools
- Scripting language experience / DevOp responsibilities such as Perl, Python etc. or similar languages
- Certification in AWS, any PaaS, and/or related technologies.