Large clusters and supercomputers are rapidly evolving and may be subject to regular hardware updates that increase the chances of becoming heterogeneous. Homogeneous clusters may also have variable performance capabilities due to processor manufacturing, or even partitions equipped with different types of accelerators. Data distribution over heterogeneous nodes is very challenging but essential to exploit all resources efficiently. In this article, we build upon task-based runtimes' flexibility to study the interplay between static communication-aware data distribution strategies and dynamic scheduling of the linear algebra LU factorization over heterogeneous sets of hybrid nodes. We propose two techniques derived from the state-of-the-art 1D×1D data distributions. First, to use fewer computing nodes towards the end to better match performance bounds and save computing power. Second, to carefully move a few blocks between nodes to optimize even further the load balancing among nodes. We also demonstrate how 1D×1D data distributions, tailored for heterogeneous nodes, can scale better with homogeneous clusters than classical block-cyclic distributions. Validation is carried out both in real and in simulated environments under homogeneous and heterogeneous platforms, demonstrating compelling performance improvements.