Deep Learning (DL) frameworks report excellent performances in several tasks, although the demand for more computational resources frequently prevents the use of more complex models and larger datasets. Learning from massive datasets in feasible time is one of the new challenges of the European-funded project PROCESS, which proposes a user-friendly access to High Performance Computing Centres (HPC) to extend HPC from task-specific to general-purpose applications. In such context we investigate the challenges of distributing computations among thousands of cores and hundreds of GPUs, highlighting future prospects and current limitations.