About The Role
You will be part of the data infrastructure team. In this role, you will be responsible for developing and maintaining the infrastructure of our data pipelines and data warehouse, helping to ensure the cost of our infrastructure is optimized. You will also help ensure that all stakeholders have access to the right data they require for their analysis and decision-making, and that they can access the data in a secure manner.
What You Will Do
- Designing and evaluating optimal data pipeline architecture
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, maintaining data pipeline monitoring, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS/GCP Big Data technologies.
- Collaborate with data scientists, engineers, and stakeholders to ensure effective deployment and integration of machine learning models in the AWS cloud environment using related AWS services
- Support data scientists to troubleshoot and debug machine learning applications, providing technical support to resolve issues.
- Manage the auto-scaling and performance monitoring of the Data Infrastructure, including for machine learning applications
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Work with data and analytics experts to strive for greater functionality in our data systems.
What We Are Looking For
- Bachelor Degree, in Statistics, Mathematics or Computer Science
- At least 3 years experience as Data Engineer
- Strong skills in SQL, with proficiency in writing efficient and optimized code for data integration, storage, processing, and manipulation, and good knowledge of ETL and/or ELT tools
- Proficiency in one or more programming languages. Python is required.
- Have experience in cloud-based data-warehousing solutions such as BigQuery, Redshift, etc.
- Have experience related to AWS services such as SageMaker, EMR, S3, DynamoDB and EC2
- Experience with Devops tools (such as Github Action) and infrastructure-as-code is a plus
- Knowledge of data security measures, including role-based access control (RBAC) and data encryption is a plus
- Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent is a plus
- Able to communicate in English