
QA Analyst- Observability & Incident Management
- Heredia
- Permanente
- Tiempo completo
- 3+ years of experience in application support, incident management, or system monitoring, with a solid grasp of ITIL concepts and incident workflows.
- Strong analytical and problem-solving skills, with keen attention to detail and effective communication in English across cross-functional teams.
- Proficient in Windows/Linux environments, with hands-on experience in SQL, Python, shell scripting, or PowerShell.
- Familiar with monitoring and performance tools such as Datadog, Dynatrace, and performance testing utilities.
- Basic understanding of cloud platforms, especially AWS and GCP.
- Exposure to backend systems and data pipelines, supporting robust system operations and diagnostics.
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Monitor system performance and detect issues early using tools like Datadog or Dynatrace.
- Respond to incidents quickly and help restore normal operations.
- Assist in Root Cause Analysis (RCA) and document findings clearly.
- Create and maintain dashboards and alerts to improve system visibility.
- Work with engineering teams to improve automation and reduce recurring issues.
- Analyze logs and data using SQL and Python.
- Collaborate with QA and DevOps teams to test system performance.
- Support internal teams with performance metrics and status updates.
- Document processes and create reports for technical and non-technical audiences.
- Use ticketing tools like Jira or ServiceNow for tracking incidents and changes.