Customer Story

Scalable Scala Spark ETL platform for global prescription analytics

We designed and deployed a Scala Spark ETL platform to transform billions of prescription records from Parquet on HDFS into timely, global insights for pharmaceutical decision-makers.

Location

UK

Industry

Sports & Entertainment

Website

Overview

We delivered a scalable Scala Spark ETL platform for a global life sciences and healthcare analytics organisation, transforming billions of prescription records stored as Parquet files on HDFS into timely, globally consistent insights. The platform supports pharmaceutical marketing and product performance analytics across regions. Delivery context: employee-led architecture and delivery, designed for maintainability and auditable data lineage.

Problem

The organisation faced a scalable data processing challenge: billions of rows of prescription data stored in Parquet on HDFS, with existing processes that were slow and difficult to optimise. Inaccuracies in insights risked misdirecting marketing and product decisions across multiple regions. There was demand for precise accuracy, predictability in nightly workloads, and manageable infrastructure complexity.

Solution

We designed and implemented a Scala Spark ETL platform to transform raw prescription data into actionable global insights, delivering reliable and timely analytics for pharmaceutical decision-makers.

Key actions included:

Architected distributed ETL pipelines in Scala and Apache Spark
Optimised processing for billions of rows of medical data
Implemented rigorous unit testing to validate against SQL extracts
Deployed nightly batch processes to generate global marketing insights
Streamlined infrastructure and data handling for efficiency and accuracy

Impact

The platform enabled consistent nightly processing of billions of records and delivered accurate, validated insights for global pharmaceutical leadership. Infrastructure efficiency and reliability improved at scale, providing timely, actionable intelligence to support marketing and product strategy across regions.

Highlights

Architected distributed ETL pipelines in Scala and Apache Spark
Optimised processing for billions of rows of medical data
Implemented rigorous unit testing to validate against SQL extracts
Deployed nightly batch processes to generate global marketing insights

Stack & Approach

Tech stack: Scala, Apache Spark, HDFS, Parquet, SQL, ETL. Approach emphasised data quality, repeatability, and observability; aligned with an employee-led delivery model to ensure knowledge transfer and long-term resilience. We validated transformations against SQL extracts via unit tests; nightly batch windows were established to deliver predictable outputs.

Another success story

Integrated online booking and EPOS for an immersive play café

UK

We delivered a scalable online booking platform for an immersive play café in the UK, integrating EPOS and Square payments to unify online and in-store operations from day one.

Hospitality & Leisure