lead big data developer for content platform
The team is currently working on a Data Warehouse and Big Data project for our client, the most trusted and esteemed source of visual content in the world, with over 200 million assets available through its industry-leading sites. It serves creative, business and media customers in almost 200 countries and is the first place people turn to discover, purchase and share powerful content from the world’s best photographers and videographers. The company cooperates with over 200,000 contributors and hundreds of image partners to provide comprehensive coverage of more than 130,000 news, sport and entertainment events, impactful creative imagery to communicate any commercial concept and the world’s deepest digital archive of historic photography.
Current project stage is re-platforming from SQL Data warehouse to Snowflake/Looker technology.
Who You Are:
You are motivated by the technical challenges that come with structured and unstructured data at an enterprise level. Even more, you are energized by bringing solutions and innovations that help the business move forward.
You are passionate about building data platforms, frameworks and driving insights from complex multi structured datasets.
- Design, implement and deliver AWS based analytical solutions
- Develop and maintain high performing ETL/ELT processes, including data quality and testing
- Own the data infrastructure including provisioning, monitoring and automation of infrastructure and application deployments
- Instrument monitoring and alerting
- Design and build data models for Snowflake warehouse and Hadoop based enterprise data lake
- Create and maintain infrastructure and application documentation
- Develop dashboards, reports and visualization
- Ensure scalability and high performance of the platform
- Design, enhance internally developed frameworks in Python
- MS/BS degree in computer science or related field
- 5+ years hands-on experience with designing and implementing data solutions that can handle terabytes of data
- Strong knowledge in modern distributed architectures and compute/ data analytics/ storage technologies on AWS Cloud
- Good understanding of infrastructure choices, sizing and cost of cloud infrastructure/ services
- Hands-on working experience in AWS Redshift or Snowflake or Google BigQuery
- Hands-on experience in administering, designing, developing, and maintaining software solutions in Hadoop Production clusters
- Solid understanding of architectural principles and design patterns/ styles using parallel large-scale distributed frameworks such as Hadoop and Spark
- Experience in Spark and Hive
- Solid experience with Python
- Experience with Terraform and Docker
- Experience with open-source job orchestration tools such as AirFlow or Job Scheduler
- Experience in reporting and visualization tools such as looker/tableau will be a plus
- Outstanding analytical skills, excellent team player and delivery mindset
- Experience in performance troubleshooting, SQL optimization, and benchmarking
- Experienced in UNIX environment such as creation of Shell scripts
- Experience in Agile methodologies
- Upper-Intermediate or higher English level (B2+)
- Remote [Job title] | EPAM Anywhere