Site Reliability Engineer/ System Administrator Job

Nairobi, Kenya
January 8, 2024
Apply Now
Deadline date:

Job Description

About ENGIE Energy Access 

ENGIE Energy Access is the leading Pay-As-You-Go (PAYGo) and mini-grids solutions provider in Africa. The company develops innovative, off-grid solar solutions for homes, public services and businesses, enabling customers and distribution partners access to clean, affordable energy. The PAYGO solar home systems are financed through affordable instalments from $0.19 per day and the mini-grids foster economic development by enabling electrical productive use and triggering business opportunities for entrepreneurs in rural communities. With over 1,800 employees, operations in nine countries across Africa (Benin, Côte d’Ivoire, Kenya, Mozambique, Nigeria, Rwanda, Tanzania, Uganda and Zambia), over 1.9 million customers and more than 9 million lives impacted so far, ENGIE Energy Access aims to impact 20 million lives across Africa by 2025. 

Job Summary:

We are seeking a talented and experienced System Administrator/Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services. You will collaborate with cross-functional teams to implement and maintain robust infrastructure solutions, focusing on automation, monitoring, and incident response. The ideal candidate is passionate about optimizing and enhancing system reliability, possesses strong problem-solving skills, and is committed to driving excellence in operational practices.

Key Responsibilities:

  1. Infrastructure Automation:
    • Develop and maintain automation tools and scripts for provisioning, configuration, and deployment.
    • Implement infrastructure as code (IaC) practices to ensure consistency and reproducibility.
  2. Monitoring and Incident Response:
    • Set up and maintain monitoring systems to detect and respond to performance issues and outages.
    • Participate in on-call rotations and respond promptly to incidents, troubleshoot, and implement solutions to prevent recurrence.
  3. Performance Optimization:
    • Optimize system performance through continuous analysis and tuning.
  4. Reliability Engineering:
    • Implement best practices for reliability, such as error budgeting, SLIs/SLOs, and blameless post-mortems.
    • Work towards minimizing manual intervention through automation.
  5. System Administration:
    • Manage and maintain server infrastructure, including installation, configuration, and troubleshooting of operating systems.
    • Implement and maintain security measures, such as firewalls and intrusion detection systems.
    • Perform regular system backups and recovery procedures.
  6. Collaboration and Communication:
    • Collaborate with cross-functional teams to align infrastructure and operational requirements.
    • Provide technical guidance and support to colleagues in areas related to reliability.

Qualifications:

  • Bachelor’s degree in computer science, Information Technology, or a related field.
  • Proven experience as a Site Reliability Engineer or System Administrator.
  • Strong Linux and Bash scripting skills.
  • Proficiency in cloud platforms (e.g., AWS, Azure, GCP, Linode, DigitalOcean).
  • Experience with container orchestration tools (e.g., Kubernetes, Docker, LXD).
  • In-depth knowledge of networking, security, and system administration.
  • Familiarity with infrastructure as code tools (e.g., Terraform, Ansible).
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.

Preferred Qualifications:

  • Experience with CI/CD pipelines and related tools.
  • Knowledge of distributed systems and microservices architecture.
  • Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack).
  • Familiarity with programming languages (e.g., Python, Ruby).

Join our team and contribute to building and maintaining a reliable and scalable infrastructure that serves and impacts millions across Africa. If you thrive in a fast-paced, collaborative environment and are passionate about ensuring system reliability, we want to hear from you!

Business Unit:  GBU Flexible Gen & Retail

Division:  Energy Access

Legal Entity:  FENIX INTERNATIONAL UGANDA LIMITED COMPANY

Contract Type:  Permanent

Job Type:  Full – Time

Professional Experience:  Skilled ( >3 experience <15 years)

Education Level:  Bachelor’s Degree