Job Description
We are seeking an experienced Site Reliability Engineer (SRE) to join our Infrastructure team, following the creation of a new internal structure.
Main Responsibilities
As an SRE, you will play a critical role in ensuring the reliability, availability, and performance of our technology platforms, as well as supporting the Software Engineering team to deploy and operate their applications in the cloud.
Key Areas of Focus
* System Reliability: Ensuring the reliability and availability of our platforms and technological systems through robust monitoring, reporting, and incident response procedures.
* Infrastructure Automation: Automating the deployment, scaling, and management of services and infrastructure components for critical applications like digital channel and branch.
* Resource Planning: Collaborating with cross-functional teams to forecast and plan future resource requirements for all infrastructure systems.
* Performance Optimization: Analyzing platform performance to improve efficiency, ensuring an optimal experience for our users and end customers.
* Incident Management Support: Participating in troubleshooting sessions, supporting operational and application teams, analyzing monitoring/root causes, and proposing tactical and strategic solutions.
* Security: Support during implementation and maintaining best security practices, participating in vulnerability assessments, and threat mitigation.
* Continuous Improvement: Continually improving system reliability through root cause analysis, incident reporting, and proactive maintenance and evolution of systems and platforms.
Requirements
We would like you to have the following experience:
* Excellent knowledge of Terraform and Ansible.
* Excellent understanding of containerization technologies (e.g., Docker, Containerd).
* Strong expertise in Kubernetes management and its more common components (e.g., Ingresses, Monitoring Stacks, Custom Autoscalers).
* Strong troubleshooting skills.
* Solid understanding of delivery systems (e.g., Helm and GitOps).
* Solid understanding in distributed system architecture.
* Good knowledge of at least one major cloud provider.
* Good scripting and programming languages (e.g., Bash, Python, Go).
* Good understanding of networking.
* Good experience on Oracle DB, MongoDB, PostgreSQL.
Nice to Have
Previous experience with GCP, AWS, Azure. Experience with common distributed systems such as caching systems (e.g., Redis), message brokers (e.g., RabbitMQ), log collection systems (e.g., ELK).
About Us
You will be part of an international group that invests in the future and sustainability, oriented towards innovation and attentive to young people and human development.
Find out more about Credit Agricole: Our Group, The Selection Process, Interview Preparation, Benefits and Smart Working, LinkedIn, Indeed .