Course Outline
Module I: Fundamentals of Large Language Models
1. Mechanisms of LLM operation—tokenization, context window, hallucination issues
2. Prompt engineering techniques for optimal results
3. Communication with models via the OpenAI API
4. Practical use of the LangChain framework
5. Architecture of Retrieval Augmented Generation (RAG) systems
Module II: Preparation of Text Documents
1. Extracting content from different file formats (PDF, DOCX, TXT)
2. The concept of dividing text into segments (chunking)
3. Document segmentation strategies
4. Impact of chunk size on system quality
Module III: Vectorization and Data Storage
1. Principles of text vector representation (embeddings)
2. Semantic search based on vector similarity
3. Qdrant vector database—configuration and application in RAG
4. Indexing and managing collections of vectors
Module IV: Document Retrieval and Quality Assessment
1. The retrieval process in RAG systems
2. Constructing context for queries to LLMs
3. Result reranking technique
4. DeepEval framework for evaluating RAG systems
Module V: Web Application for the RAG System
1. Basics of the Streamlit library
2. Implementing the user interface
3. Integrating RAG system components into the web application