Introduction to Data Science and Artificial Intelligence with Python

Course description

Overview

A four-day hands-on workshop course that guides participants from data analysis and preparation in Python (Pandas), through building and evaluating machine learning models (regression and classification), to a practical introduction to large language models (LLMs) and their applications in analytical and product work. The sessions combine short theoretical blocks with intensive data exercises, so that participants complete the course with a set of ready-to-use techniques and best practices to apply in everyday tasks.

Course objectives

After the course, participants will be able to:

prepare data for analysis (cleaning, filtering, aggregations, feature engineering) using Pandas
build, evaluate, and compare regression and classification models in scikit-learn
apply techniques to improve model quality: normalization, categorical variable encoding, cross-validation, and hyperparameter tuning
identify typical risks (e.g. overfitting) and choose appropriate evaluation metrics
understand the basics of working with LLMs (APIs, embeddings, multimodal models) and safely prototype simple solutions

Book the course

Format: Remote
Language: Polish
Type: Public course
Date: 24-27.02.2026
Duration: 4 days (7h/day)
Trainer: Patryk Palej
Validator: Bartosz Wójcik

BOOK - 5160 PLN

Net price per participant.

Target audience

The course is intended for people who want to develop data/AI skills in practice, in particular:

data analysts, business analysts, and reporting specialists
people from finance, sales, operations, and marketing teams working with data
developers and engineers who want to organize ML/LLM fundamentals in Python
people preparing for a Data Scientist / ML Engineer role at junior/regular level

Prerequisites

basic knowledge of Python (variables, loops, functions, working in a notebook)
basic data skills (tables, data types, simple calculations)
readiness to work on your own computer in a remote environment

Training methods

Practical workshops dominate (approx. 75% of the time). During the sessions, the following are used, among others:

individual and group exercises in notebooks (Python)
short theoretical introductions preceding labs
case studies based on data close to real business applications
mini-projects summarizing each day and work on pipelines
Q&A sessions and consultations on participants’ work problems (if data/cases are provided)

Training materials

Participants receive a complete set of materials used during the course, including:

notebooks (Jupyter) and exercise files
datasets for labs
cheat sheets with key Pandas functions and scikit-learn components
links to recommended sources and documentation

Technical requirements for remote training

To participate, the following are required:

computer with Windows/macOS/Linux (min. 8 GB RAM, 16 GB recommended)
stable internet connection (min. 10 Mb/s)
headset with microphone and a camera (recommended for workshop work)
access to MS Teams or Zoom
optional: access to the DaDesktop environment provided by NobleProg (if launched for the group)

Validation and certificates

Learning outcomes are validated based on practical tasks (notebooks), mini-projects, and quality checklists. After completing the course, participants receive a NobleProg certificate in electronic form.

Information document

The program can be adapted to the group’s needs after requirements analysis (PCQ).

Learning outcomes and verification criteria

Data analysis in Pandas

Creates and modifies Series/DataFrame objects, performs operations on tables and columns
Applies filtering, sorting, grouping, and aggregations to solve analytical problems

Verification criterion: Verification: practical tasks in a notebook (calculation results and correctness of data transformations)

Regression models

Selects and trains a regression model for a given problem
Evaluates the model with appropriate metrics and interprets the results
Applies normalization and data preparation, and mitigates overfitting

Verification criterion: Verification: regression mini-project + discussion of metrics and conclusions.

Classification models

Builds and compares classification models, selecting metrics (e.g. accuracy, precision/recall, F1, ROC-AUC)
Applies ensemble techniques to improve prediction quality

Verification criterion: Verification: classification task + comparison of at least 2 models and justification of the choice.

Working with LLMs

Understands basic concepts: prompt, context, embeddings, tokenization, model limitations
Uses an LLM API to generate and process text and creates embeddings for semantic search
Knows basic safety and quality principles (data protection, testing, limitations)

Verification criterion: Verification: integration exercise (simple prototype) + data security checklist.

Course agenda

Day 1 - Data analysis in Pandas

Basic data types: Series and DataFrame (creation, indexing, data types).
Table operations: loading data, merging (merge/join), concatenation.
Filtering, sorting, grouping: groupby, aggregations, pivot table.
Value modification: mapping, replace, missing data (NaN) and filling strategies.
Column operations: feature creation and transformation, apply/assign functions.

Day 2 - Machine Learning: regression algorithms

Introduction to regression: problem, data, ML pipeline.
Model evaluation: metrics (MAE, MSE/RMSE, R2), train/test split.
Normalization and standardization: when and why.
Handling categorical variables: One-Hot Encoding and Label Encoding.
Overfitting: diagnosis and mitigation methods.
Cross-validation: choosing a validation strategy.
Grid Search and hyperparameter optimization (including pipeline).

Day 3 - Machine Learning: classification algorithms

Introduction to classification: binary and multiclass.
Classification evaluation: confusion matrix, accuracy, precision/recall, F1, ROC-AUC.
Overview of classification algorithms (e.g. logistic regression, trees, SVM, kNN).
Ensemble: combining classifiers (bagging, boosting, stacking - overview and practice).

Day 4 - Large language models (LLMs)

Introduction to LLMs: capabilities, limitations, typical business use cases.
OpenAI API and other models: integration basics, costs, limits, best practices.
Multimodal models: working with text and images - scenarios and limitations.
Embeddings: semantic search, clustering, simple recommendations, RAG basics.

No budget available? Get funding!

A program that makes it quick and easy to obtain funding for courses for individual participants.

Offer of public courses with a guaranteed date shown as pictograms inside screws.

Why a guaranteed course?

Guaranteed delivery. The course will take place regardless of the number of participants.
Knowledge and experience exchange with specialists from other industries.
Interactive, live-led sessions. Not only theory, but also practical exercises and discussions.
Flexible remote format. Join from anywhere.