This thesis presents a set of works on the use of deep learning (DL) in the next generation of communications systems. Digital communications is a cornerstone technology of the modern information age. Thus, a natural question to ask is whether DL enables better solutions for common problems in communications systems. To that end, this thesis poses two research questions: RQ1: How can deep learning techniques be effectively applied across different parts of the networking stack? RQ2: What are the advantages and challenges of implementing deep learning in the physical layer versus the application layer? This thesis consists of two parts with three chapters each. The first three content chapters discuss works that relate to the physical layer of the networking stack. The second three chapters contain works that relate to the application layer of the networking stack. Chapter 2 focuses on improving deep learning in the physical layer of communication networks. We develop a more flexible neural receiver that can handle multiple types of modulations. A modulation type is a configuration that makes a trade-off between speed and reliability. This approach makes deep learning based communication systems modular, making them more adaptable and efficient. Chapter 3 continues work on the physical layer, tackling the challenge of making deep learning receivers faster and more energy-efficient. We use a method of deep learning called group equivariant deep learning to build neural networks that inherently understand certain properties of radio signals. This results in smaller, more efficient networks that perform just as well as larger ones. Chapter 4 shifts to underwater acoustics. We use a technique called contrastive learning to train neural networks without labeled data. This approach could improve underwater sound classification and potentially help simulate underwater communication channels more accurately. Chapter 5 explores a security vulnerability in video streaming platforms like YouTube. Videos are generally streamed dynamically which is known to result in vulnerabilities. Using deep learning techniques, specifically deep-metric learning, we demonstrate that video IDs can be identified from encrypted streams with minimal examples. This work highlights how DL simplifies exploiting vulnerabilities between communication layers. Chapter 6 focuses on improving neural image compression. Building upon the work of Yang et al. (2020), we propose SGA+, a set of functions that improve the weighting of Gumbel probabilities. Our approach, particularly the SSL function with hyperparameter a, converges faster and achieves better PSNR/BPP trade-offs than the original SGA method. This advancement in neural image compression is important for reducing internet congestion and improving edge device capabilities. Chapter 7 investigates the application of Federated Learning (FL) in Face Recognition (FR) to address privacy concerns and issues with data heterogeneity. Given the challenges of non-shared identities across parties, we propose using federated meta-learning. Our approach demonstrates improved overall performance per client, especially under heterogeneous data splits. Notably, the performance gains primarily benefit weaker clients, reducing the variance in performance across clients. Together, these works provide partial answers to the research questions posed above. Major factors in applying DL across network layers are modularity, translating DL research to communications, and integrating DL advancements into the communications stack of tomorrow. Notably, there are differences between the application layer and physical layer. This thesis found the physical layer to benefit more broadly from the inclusion of DL. However, the models used throughout this work were highly similar; they were all based on convolutional neural networks (CNNs). The future of DL for communications looks bright and much work remains to be done.