Job Title: Site Reliability Engineer (SRE)
We are seeking an experienced Site Reliability Engineer (SRE) to join our Infrastructure team. As an SRE, you will play a critical role in ensuring the reliability, availability, and performance of our technology platforms, as well as supporting the Software Engineering team to deploy and operate their applications in the cloud.
We'll work together in several key areas:
* System Reliability: Ensuring the reliability and availability of our platforms and technological systems through robust monitoring, reporting, and incident response procedures.
* Infrastructure Automation: Automating the deployment, scaling, and management of services and infrastructure components for critical applications like digital channel and branch.
* Resource Planning: Collaborating with cross-functional teams to forecast and plan future resource requirements for all infrastructure systems.
* Performance Optimization: Analyzing platform performance to improve efficiency, ensuring an optimal experience for our users and end customers.
* Incident Management Support: Participating in troubleshooting sessions, supporting operational and application teams, analyzing monitoring/root causes, and proposing tactical and strategic solutions.
* Security: Supporting during implementation and maintaining best security practices, participating in vulnerability assessments, and threat mitigation.
* Continuous Improvement: Continually improving system reliability through root cause analysis, incident reporting, and proactive maintenance and evolution of systems and platforms.
We would like you to have the following experience:
* Excellent knowledge of Terraform and Ansible
* Excellent understanding of containerization technologies (e.g., docker, containerd)
* Strong expertise in Kubernetes management and its more common components (e.g., ingresses, monitoring stacks, custom autoscalers)
* Strong troubleshooting skills
* Solid understanding of delivery systems (e.g., Helm and GitOps)
* Solid understanding in distributed system architecture
* Good knowledge of at least one major cloud provider
* Good scripting and programming languages (e.g., Bash, Python, Go)
* Good understanding of networking
* Good experience on Oracle DB, MongoDB, PostgreSQL
Nice to have:
* Previous experience with GCP, AWS, Azure
* Experience with common distributed systems such as caching systems (e.g., Redis), message brokers (e.g., RabbitMQ), log collection systems (e.g., ELK)
We will make sure you always have:
* Autonomy and responsibility: You will be free to choose, try, fail, and try again. We believe that getting involved is the first step to making a difference.
* Career opportunities: You will be evaluated every six months and your results will guide your growth path.
* Continuous training: We believe in talent and we like to cultivate it. You will have training and refresher courses available during which you can learn from industry experts.
* Stimulating environment: We work in a dynamic and synergistic way, cross-team. This will allow you to deal with talented professionals and always challenging activities.
Work Location: Parma, Milan, or Sondrio (all locations with possibility for smart working).
We appreciate your interest in our group. Candidates with matching profiles will be contacted directly by the Human Resources function of Credit Agricole Italia.
You will become part of an international group in strong growth, investing in the future and sustainability, oriented towards innovation and attentive to young people and human potential development.