Introduction to Data Science and Artificial Intelligence with Python
Course description
Overview
A four-day hands-on workshop course that guides participants from data analysis and preparation in Python (Pandas), through building and evaluating machine learning models (regression and classification), to a practical introduction to large language models (LLMs) and their applications in analytical and product work. The sessions combine short theoretical blocks with intensive data exercises, so that participants complete the course with a set of ready-to-use techniques and best practices to apply in everyday tasks.
Course objectives
After the course, participants will be able to:
- prepare data for analysis (cleaning, filtering, aggregations, feature engineering) using Pandas
- build, evaluate, and compare regression and classification models in scikit-learn
- apply techniques to improve model quality: normalization, categorical variable encoding, cross-validation, and hyperparameter tuning
- identify typical risks (e.g. overfitting) and choose appropriate evaluation metrics
- understand the basics of working with LLMs (APIs, embeddings, multimodal models) and safely prototype simple solutions
Book the course
- Format: Remote
- Language: Polish
- Type: Public course
- Date: 24-27.02.2026
- Duration: 4 days (7h/day)
- Trainer: Patryk Palej
- Validator: Bartosz Wójcik
Net price per participant.
Target audience
The course is intended for people who want to develop data/AI skills in practice, in particular:
- data analysts, business analysts, and reporting specialists
- people from finance, sales, operations, and marketing teams working with data
- developers and engineers who want to organize ML/LLM fundamentals in Python
- people preparing for a Data Scientist / ML Engineer role at junior/regular level
Prerequisites
- basic knowledge of Python (variables, loops, functions, working in a notebook)
- basic data skills (tables, data types, simple calculations)
- readiness to work on your own computer in a remote environment
Training methods
Practical workshops dominate (approx. 75% of the time). During the sessions, the following are used, among others:
- individual and group exercises in notebooks (Python)
- short theoretical introductions preceding labs
- case studies based on data close to real business applications
- mini-projects summarizing each day and work on pipelines
- Q&A sessions and consultations on participants’ work problems (if data/cases are provided)
Training materials
Participants receive a complete set of materials used during the course, including:
- notebooks (Jupyter) and exercise files
- datasets for labs
- cheat sheets with key Pandas functions and scikit-learn components
- links to recommended sources and documentation
Technical requirements for remote training
To participate, the following are required:
- computer with Windows/macOS/Linux (min. 8 GB RAM, 16 GB recommended)
- stable internet connection (min. 10 Mb/s)
- headset with microphone and a camera (recommended for workshop work)
- access to MS Teams or Zoom
- optional: access to the DaDesktop environment provided by NobleProg (if launched for the group)
Validation and certificates
Learning outcomes are validated based on practical tasks (notebooks), mini-projects, and quality checklists. After completing the course, participants receive a NobleProg certificate in electronic form.
Information document
The program can be adapted to the group’s needs after requirements analysis (PCQ).
Learning outcomes and verification criteria
Data analysis in Pandas
- Creates and modifies Series/DataFrame objects, performs operations on tables and columns
- Applies filtering, sorting, grouping, and aggregations to solve analytical problems
Verification criterion: Verification: practical tasks in a notebook (calculation results and correctness of data transformations)
Regression models
- Selects and trains a regression model for a given problem
- Evaluates the model with appropriate metrics and interprets the results
- Applies normalization and data preparation, and mitigates overfitting
Verification criterion: Verification: regression mini-project + discussion of metrics and conclusions.
Classification models
- Builds and compares classification models, selecting metrics (e.g. accuracy, precision/recall, F1, ROC-AUC)
- Applies ensemble techniques to improve prediction quality
Verification criterion: Verification: classification task + comparison of at least 2 models and justification of the choice.
Working with LLMs
- Understands basic concepts: prompt, context, embeddings, tokenization, model limitations
- Uses an LLM API to generate and process text and creates embeddings for semantic search
- Knows basic safety and quality principles (data protection, testing, limitations)
Verification criterion: Verification: integration exercise (simple prototype) + data security checklist.
Course agenda
Day 1 - Data analysis in Pandas
- Basic data types: Series and DataFrame (creation, indexing, data types).
- Table operations: loading data, merging (merge/join), concatenation.
- Filtering, sorting, grouping: groupby, aggregations, pivot table.
- Value modification: mapping, replace, missing data (NaN) and filling strategies.
- Column operations: feature creation and transformation, apply/assign functions.
Day 2 - Machine Learning: regression algorithms
- Introduction to regression: problem, data, ML pipeline.
- Model evaluation: metrics (MAE, MSE/RMSE, R2), train/test split.
- Normalization and standardization: when and why.
- Handling categorical variables: One-Hot Encoding and Label Encoding.
- Overfitting: diagnosis and mitigation methods.
- Cross-validation: choosing a validation strategy.
- Grid Search and hyperparameter optimization (including pipeline).
Day 3 - Machine Learning: classification algorithms
- Introduction to classification: binary and multiclass.
- Classification evaluation: confusion matrix, accuracy, precision/recall, F1, ROC-AUC.
- Overview of classification algorithms (e.g. logistic regression, trees, SVM, kNN).
- Ensemble: combining classifiers (bagging, boosting, stacking - overview and practice).
Day 4 - Large language models (LLMs)
- Introduction to LLMs: capabilities, limitations, typical business use cases.
- OpenAI API and other models: integration basics, costs, limits, best practices.
- Multimodal models: working with text and images - scenarios and limitations.
- Embeddings: semantic search, clustering, simple recommendations, RAG basics.
No budget available? Get funding!
A program that makes it quick and easy to obtain funding for courses for individual participants.
Why a guaranteed course?
- Guaranteed delivery. The course will take place regardless of the number of participants.
- Knowledge and experience exchange with specialists from other industries.
- Interactive, live-led sessions. Not only theory, but also practical exercises and discussions.
- Flexible remote format. Join from anywhere.
Need Help?
Reach out to learn more about our team and the kinds of tailored solutions we can offer your organization.
Get in Touch