We are currently looking for a remote Lead Big Data Software Engineer with 5+ years of production experience with Spark (PySpark) to join our team.
The customer is a biotechnology company, which engages in the discovery, invention, development, manufacture, and commercialization of medicines.
The main goal is to work out a solution that consumes and stores data from multiple customer’s domains.
- Implement pipeline processing application using PySpark and Airflow
- Integrate required database structure
- Apply data marts in Hive and PostgreSQL
- Create analytical SQL-scripts in PostgreSQL or any other DB
- Communicate with English speaking colleagues and customer representatives
- 5+ years of production experience with Spark (PySpark)
- Extensive knowledge and experience with Apache Airflow
- Familiarity with Apache Hive
- 1+ year of relevant leadership experience
- English level B2+
- Working experience within AWS services: S3, Athena, EC2
looking for something else?
Find a vacancy that works for you. Send us your CV to receive a personalized offer.