Linux Site Reliability Consultant
Pythian
- Costa Rica
- Permanente
- Tiempo completo
- Operate, maintain, and administer solutions contributing to customer infrastructure's operational efficiency, availability, and visibility.
- Planning maintenance activity, design documentation, and standard procedures
- Provide Root Cause Analysis reports for outages/incidents (ITIL - Problem Management)
- Observe and provide feedback on the current state of the client's infrastructure, and identify opportunities to improve resiliency, reduce incident occurrence, and automate repetitive administrative and operational tasks.
- Contribute to, improve, and maintain team documentation about client systems and infrastructure, procedures, policies, and schedules.
- Gather and document information about client environments through audit activities, and analyze the information to identify opportunities for improvement and application of best practices.
- Work collaboratively with teammates to contribute to the continuous improvement of our working culture.
- Act as a technology leader for clients, as well as drive client discussions on technology road maps.
- Participate in an on-call rotation in an escalation capacity.
- Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc)
- Scripting and automation of administrative tasks using Python and Scala is a must
- Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc)
- Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc)
- Understanding the end-to-end operations of a 'Business System' vs components.
- Comprehensive systems hardware and network troubleshooting experience
- Common Linux distribution platform installation, configuration, performance tuning, and cloud migration.
- TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc)
- Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc.)
- Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud
- Administration of web servers and supporting technologies, including network load balancers
- Experience with the design, development, and deployment of Puppet
- System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis
- Experience managing networking devices, such as switches and firewalls from a variety of vendors
- Solid understanding of DevOps tools, processes, and culture
- Ability to pick up new technologies quickly
- Ability to provide accurate work scheduling and task estimations for work delivery
- Love your career: Competitive total rewards package.
- Love your coworkers: Collaborate with some of the best and brightest in the industry!
- Love your development: Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
- Love your workspace: We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
- Love yourself: Pythian cares about the health and well-being of our team. You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more). Additionally, you will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity.