Lead Data Engineer

Lead Data Engineer

  • Administrativ / Logistică
  • București, RO

Description
SuperAwesome focuses on developing products aimed at ensuring child safety and privacy in the digital space – this is the mission of the company.
As the Tech Lead of Data Engineering you’ll be at the forefront of we shape it, store it, and use it. You’ll lead a team of highly skilled and mission driven engineers, committed to making the internet safer for the next generations. You’ll work together with the other leads across technology, product, and other departments to identify areas of improvement for the overall approach to data engineering, lay the groundwork to enable AI driven capabilities in our products, and enable everyone with an easy and accessible path to make data driven decisions. Together with our Chief Architect, Infosec and Legal functions you’ll ensure the overall data and information architecture is kept safe and secure. You’ll work with your team to refine the roadmap and break it down into deliverables of the highest quality and impact. As a lead, you’ll champion continuous improvement and always aim to improve the product your team owns and measure your impact with the appropriate tech, product, or delivery metrics.

Technologies
Python
SQL
Databricks/Snowflake
ETL
AWS/GCP
Requirements
Must technologies:
Expert understanding of Data pipeline design and implementation using Databricks, and Python (or Python derivative, like PySpark)
Expert visualisation skills using Sisense and/or other visualisation tools
Expert with SQL
Expert understanding of Data management and/or Data governance (making sure the data is of expected volume, schema, etc.)
Experience with modern cloud-based Data Engineering on AWS or other cloud platforms
Nice to have:

Experience with Kedro on Databricks
Experience in designing and implementing Data mesh
Exposure to Airbyte reading from multiple different data sources
Good understanding of microservices architecture principles
Responsibilities
Be accountable for:
- design and implementation of features and services for SuperAwesome’s data lake, ingestion and distribution systems,

- availability and performance of our data lake, ingestion and distribution systems,

- design and implementation of ingestion, enrichment and transformation pipelines

Lead system design activities aimed to create clarity, alignment, and ascertain feasibility
Work across the full stack depending on where you can drive the highest impact: from ETL pipelines to data warehousing to visualisations, as well as testing and cloud infrastructure.
Use your experience in data management and governance to define and implement best practices across our systems ensuring security, consistency and quality across all of our data.
Develop and optimise Spark jobs, notebooks, and data pipelines within the Databricks environment.
Work with the Chief Architect and ML engineers to define an information architecture capable of enabling an easy implementation of data models and catalogues within the data lake, ensuring data quality, integrity, and optimal performance.
Train/mentor other engineers inside and outside of the team on Data engineering best practices
Frequently review system, delivery and quality metrics and drive the right initiatives to improve them, ensuring long-term quality, scalability and maintainability of our systems
Set the bar in what good quality data looks like, and use your expert knowledge to train and mentor other engineers to improve data quality
Ensure business continuity by keeping the design choices well documented and explained, from raw data analysis to defining and implementing data ingestion/enrichment pipelines to creating data visualisations
Champion the DataOps culture, support data systems in production, including participation in our out-of-hours on-call rota
Clearly communicate our data platform strategy and plans to others in the company
Recruitment Process
Initial screen (Alcor’s side) - 30 minutes;
Technical screen (video call with Engineering manager) - 45 minutes, 12-15 quick question (different for each position), to understand what the shape of technical knowledge of a candidate is;
Code test (completed at home and typically takes 1.5 to 2 hours, with no strict time limit), the scope is different for each role;
Systems design/Final interview/cultural fit interview
Benefits
PTO - 25 days per year
Summer Fridays - last Friday in June to last Friday in August is a half day
Winter Break - two week company closure over Christmas and New Year
Volunteer Days - two per year
Sick Leave - 10 days per year
Sabbatical - 30 days after 7 years of service
Referral Program for bringing us new team members
Annual company offsite in the UK
Team social events and treats (online and in person)
Laptop provided along with SA swag
Huntly Insights
Dear Recruiters,
To make your search easier, we have prepared some insights for you:

Specific proficiencies we seek: (must-have’s)

SQL 5 + years of experience
Python 5 + years of experience
Databricks or other Datalakes technologies (eg: Snowflake) 5 + years of experience
ETL processes 5 + years of experience
Airbyte or other ETL tools (eg: Fivetran, Stitch etc) 4 years of experience
Sisense or other dashboarding experience (eg: Power BI, Tableau, Looker etc) 4 years of experience
Generic proficiencies we seek: (portable knowledge. The more the better)

Designing and implementing large scale data processing systems.
Cloud-based data platforms (eg, AWS, GCP)
Datalakes (Databricks, Snowflake)
DATABASES
- OR a SQL db (eg: postgres, mysql, mssql)

- OR a NoSQL db (eg: dynamoDB, Redis, mongoDb)

A queue (eg:SQS, RabbitMQ) and/or streaming service (eg: Kafka, Kinesis)
A cloud-native CI/CD environment (eg: CircleCI, GitHub Actions)
A monitoring system (eg: Grafana, datadog)
An incident management system (eg: pagerduty, opsgenie)
Strong stakeholder management, Excellent client-facing skills and the ability to liaise with clients at all levels.
Environment templating (eg: terraform, helm, cloudformation)
Strategic planning
Our main stack (any overlap is a plus)

Python
Databricks
Kedro
AWS
Kafka
Airbyte
Postgres
CircleCI
Grafana
Sisense
Druid
Github
Kubernetes (EKS)
Grafana
Being deprecated (any knowledge here is still a plus)

Druid
Datadog
Nice to have’s

Data mesh design and implementation
Data modeling
Developing and deploying machine learning solutions
Data Visualisations
Data Architecture

84000 USD pe an

fulltime

București, RO

Aptitudini / abilităţi

  • Data Engineering

Crează-ţi contul