For This Role
We are a Data Engineering Service that provides the collection, preprocessing, and storage of analytical data for data-driven decision-making, including the company's products and process improvements via automation, all types of data analysis, and machine learning.
We are seeking a Middle Data Engineer to join our team. You will play a crucial role in managing TBs of data and integrating technologies such as Spark and Delta Lake. You`ll be engaged in the complete data preparation cycle — from collection to serving to our data users.
If you’re excited about data engineering and want to make an impact at MacPaw, we’d love to hear from you!
In this role, you will:
- Develop and maintain data tools (Anomaly Detection System, ML platform, etc.) and data pipelines (Data Streaming Processors, ETL for third-party data sources) for other MacPaw teams according to functional and non-functional requirements, ensuring high quality (data reliability, efficiency, accessibility, etc.)
- Analyze (how to do?) and validate (can be done?) technical solutions for business problems/ideas/needs. Suggest alternatives and ways for implementation with reasoning about their pros and cons
- Decompose and estimate validated technical solutions for business problems/ideas/needs into technical tasks
- Research and suggest architectural solutions and/or development tools to implement the technical task
- React to issues and failures by investigating and fixing them according to service task priorities
- Develop and maintain documentation, code, and business logic according to service requirements
- Communicate with other teams to clarify requests and/or implementation details, edge cases, specify input or missing data and possible use cases/flows to successfully implement a solution and collaborate constructively
Skills you’ll need to bring:
- Proficient in Python for data pipelines and advanced SQL (complex transformations, query optimization, window functions)
- Experience building and optimizing large-scale data processing jobs with PySpark
- Experience designing and building ETL/ELT workflows with Airflow
- Experience with real-time data processing using message brokers like Kafka or RabbitMQ
- Experience with data warehouses (BigQuery, Redshift), RDBMS (PostgreSQL preferred), and in-memory stores (Redis preferred)
- Experience with a major cloud provider (GCP preferred), using services for storage, compute, and IAM
- Experience with containerization using Docker (building images, managing containers)
- At least an Intermediate level of English & fluent Ukrainian
As a plus:
- Knowledge of dbt
- Understanding of Open-table formats (Delta Lake, Apache Iceberg)
- Knowledge of data lakehouse concepts
- Experience building data APIs or services using backend frameworks (e.g., FastAPI, Django)