

Photo by author
# Introduction
I know a lot of people want to study LLM in depth, and though courses and subjects are great to get broader knowledge, one needs to refer books for a really deep understanding. Another thing I personally love about books is their structure. They have an order that is more intuitive and coherent than courses that can sometimes feel all over the place. With this motivation, we are starting a new series for our readers to recommend 5 free but totally worthwhile books for different characters. So, if you’re serious about understanding how large language models (LLMs) really work, here are my recommendations. 5 free books That’s where you should start.
# 1. Basics of the Big Language Model
Published in early 2025, Foundations of large language models One of the most well-structured and conceptually clear books ever written for anyone who really wants to understand how an LLM is built, trained and connected. The authors (Tong Zhao and Jingbo Zhou) are both well-known figures in Natural Language Processing (NLP). Instead of touting every new architecture or trend, they carefully explain the underlying mechanisms behind modern models like GPT, Burt, and Lama.
The book emphasizes fundamental thinking: what pre-training actually means, how generative models work internally, why strategies matter, and what’s really involved when humans try to fine-tune machine behavior. I think it’s a thoughtful balance between theory and practice, designed for both students and practitioners who want to build a strong conceptual foundation before embarking on an experiment.
// Outline review
- Pre-training (review, practical aspects of adopting and using different models, BRIT, pre-trained models etc.)
- Generative models (decoder-only transformers, data preparation, distributed training, scaling laws, memory optimization, performance strategies, etc.)
- Pointing (principles of good prompt design, modern prompting methods, techniques for improving prompts)
- Alignment (LLM Alignment and RLHF, Instruction Tuning, Reward Modeling, Preference Optimization)
- Mitigation (regulatory algorithms, evaluation metrics, guidance on effective methods)
# 2. Speech and language processing
If you want to understand NLP and LLMs in depth, Speech and language processing by Daniel Jurofsky and James H. Martin is one of the best resources. The third edition draft (August 24, 2025 release) has been completely updated to cover modern NLP, including Transformer, LLMS, automatic speech recognition (Whisper), and text-to-speech systems (Encodec and Wall-E). Jurofsky and Martin are leaders in computational linguistics, and their book is widely used at top universities.
This LLM provides a clear, structured approach to training, alignment, and conversational structures from basics like tokens and embedding to advanced topics. The PDF draft is freely available, making it both practical and accessible.
// Outline review
- Volume I: Major Models of Language
- Chapters 1–2: Introduction, Vocabulary, Tokens, and Unicode Handling
- Chapters 3–5: Ngram LMS, Logistic Regression for Text Classification, and Vector Embeddings
- Chapters 6–8: Neural Networks, LLMs, and Transformers—including sampling and training techniques
- Chapters 9–12: Post-Training Tuning, Masked Language Models, IR and Cheat, and Machine Translation
- Chapter 13: RNNS and LSTM (Optional Order for Learning Sequence Models)
- Chapters 14–16: Acoustics, Speech Feature Extraction, Automatic Speech Recognition (Whisper), and Text-to-Speech (Encodec and Wall-E)
- Volume II: Interpreting Linguistic Structure
- Chapters 17-25: Sequence Labeling, POS and Near, CFG, Dependency Parsing, Information Extraction, Semantic Role Labeling, Lexicon, Correlation Resolution, Discourse Coherence, and Discourse Structure
# 3. How to scale your model: A systems view of LLMS on TPUs
Training LLMs can be difficult because the numbers are huge, the hardware is complex, and it’s hard to know where the bottlenecks are. How to scale your model: A systems view of LLMS on TPUs It takes a very practical, systems-oriented approach to explain the performance side of LLMs, such as how tensor processing units (TPUs) (and GPUs) work under the hood, how these devices communicate, and how LLMSs run on actual hardware. It also covers parallel strategies for training and estimating models to scale efficiently at large scale sizes.
This resource stands out because the authors have actually worked on a production-grade LLM system at Google themselves, so they share their learnings.
// Outline review
- Part 0: Ceiling Lines (Understanding Hardware Constraints: Flops, Memory Bandwidth, Memory)
- Part 1: TPUs (How TPUs Work and Network Together for Multi-Chip Training)
- Part 2: Sharding (matrix multiplication, TPU communication costs)
- Part 3: Transformer math (calculating flops, bytes, and other important metrics)
- Part 4: Training (Parallelization Strategies: Data Parallelism, Fully Sharded Data Parallelism (FSDP), Tensor Parallelism, Pipeline Parallelism)
- Part 5: Training Llama (practical examples of training on TPU V5P 3 cost, sharding, and size considerations)
- Part 6: Determination (Latency Considerations, Efficient Sampling and Accelerator Use)
- Part 7: Servicing Lama (Servicing Lama 3-70B Model on TPU V5E; KV Caches, Batch Sizes, Sharding, and Production Latency Estimates)
- Part 8: Profiling (practical optimization using XLA compiler and profiling tools)
- Part 9: JAX (Programming TPUs Efficiently with JAX)
# 4. Understanding Big Language Models: Towards Rigorous and Targeted Interpretations Using Checked Classifications and Self-Suppression
Understanding Big Language Models: Towards Rigorous and Targeted Interpretation Using Checked Classifications and Self-Suppression Not a typical textbook. This is Jenny Kunz’s doctoral thesis from Linkoping University, but it covers such a unique aspect of the LLM that it deserves a place on this list. She explores how the major models of language work and how we can better understand them.
LLMs perform very well on many tasks, but it is not clear how they make their predictions. This paper explores two ways of understanding these models: looking at the internal layers using exploratory taxonomy and testing the explanatory models developed for their predictions. She also tests models that generate free-text descriptions with their own predictors, exploring which features of these descriptions actually support downstream tasks and which are consistent with human intuition. This work is useful for researchers and engineers interested in building more transparent and responsive AI systems.
// Outline review
- LLM Understanding Layers with an Investigative Hierarchy (analyzing the information stored in each layer of the model, testing the limitations of existing investigation methods, creating rigorous investigation tests using changes in data, developing new ways to measure differences in layers knowing differences)
- Explaining predictors with self-regulating models (creating text descriptions with model predictions, comparing descriptions with human classification and task performance, studying what properties make descriptions useful for easy-to-understand tasks, describing human-like characteristics and describing their effects on different users))
# 5. Major Language Models in Cybersecurity: Threats, Exposure and Mitigation
LLMs are very powerful, but they can also create risks such as leaking private information, aiding phishing attacks, or introducing code vulnerabilities. Major Language Models in Cybersecurity: Threats, Exposure and Mitigation Explains these risks and shows ways to reduce them. It covers real-world examples, including social engineering, monitoring LLM adoption, and establishing secure LLM systems.
This resource is unique because it focuses on the LLMS in Cybersecurity, which most LLM books do not include. It is very useful for anyone who wants to understand both the risks and protections related to LLM.
// Outline review
- Part I: Introduction (How LLMs work and how they are used, limitations of LLMs and evaluation of their functions)
- Part II: LLM in Cybersecurity (risks of private information leakage, phishing and social engineering attacks, threats from code suggestions, LLM-assisted influence operations and web indexing)
- Part III: Tracking and forecasting exposure (monitoring LLM adoption and risks, investment and insurance aspects, trends in copyright and legal issues, new research in LLM)
- Part IV: Mitigation (security education and awareness, privacy protection training methods, defenses against attacks and hostile use, LLM detectors, red teaming, and security standards)
- Part V: Conclusion (Dual Role of LLM in Creating Threats and Providing Defence, Recommendations for Safe Use of LLM)
# wrap up
All five books approach the LLM from very different angles: theory, linguistics, systems, interpretation and security. Collectively, they form a complete learning path for anyone serious about learning large language models. If you liked this article, let me know in the comments section what topics you want to explore more.
Kanwal Mehreen is a machine learning engineer and technical writer with a deep passion for data science and the intersection of AI with medicine. He co-authored the eBook “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she champions diversity and academic excellence. He has also been recognized as a Teradata Diversity in Tech Scholar, a MITACS GlobalLink Research Scholar, and a Harvard Wicked Scholar. Kanwal is a passionate advocate for change, having founded the Fame Code to empower women in stem fields.