Safaricom PLC
Service Availability DevOps Engineer at Safaricom Kenya
Job Description
Reporting to the Engineering Lead – Service Availability, the position holder will be tasked with monitoring & Observability and improving the operational aspects of all systems in scope within DIT. Drive automation and Dev-ops across the different domains. Foster service monitoring through proactive initiatives like AIOPs, machine learning among other available channels.
RESPONSIBILITIES
- Proactively building and implementing monitoring services, including end to end monitoring, scripting and automation, modern tooling and maintenance software.
- Use of AI and Machine learning to perform log analysis and create predictive models that will assist in identifying potential failures.
- Developing and executing automation scripts and maintenance jobs.
- Developing automation around monitoring.
- Onboarding DIT systems to the service monitoring tools (APMs like ELK).
- Clearly document any monitoring gaps noted and collaborate with the relevant teams to ensure timely closure.
- Performance of Applications error analysis and follow-up to ensure optimal customer experience.
- Deployment of planned & operational changes on systems in scope.
- Support all Digital squads to ensure new products are monitored.
- Support in Zero touch Operations initiatives.
- Support in development of collectors and agents
QUALIFICATIONS
- Bachelor’s Degree in either Computer Science or Information Technology, Electrical and communication engineering or Business Information Systems or in a relevant field in telecommunication.
- Domain knowledge in at least 2 of the following areas , Sysadmin especially Linux, Orchestration (Kubernetes), Linux Kernel, Open telemetry.
- Good understanding of back-end programming such us Python & RUST
- Technical understanding of SRE concepts & DevOps Practices with respect to providing stable services to customers and adhering to availability KPIs, Service Level Objectives, Service Level Indicators & conforming to target monthly error budget.
- Be well versed with one or more modern monitoring tools such as ELK, Prometheus, Dynatrace, AppDynamics, New Relic, Splunk etc.
- Good understanding of the micro service architecture & appreciation of the traditional/classic SOA
- Ability to manage a team having leadership skills, ownership of issues been analytical and a problem solver.
- Being able to implement strict change management policy.
- Conversant with agile ways of working.