Szkolenia Statystyka

Statistics Training

Practical Applied Statistics courses

Podkategorie

Plany Szkoleń Statystyka

Identyfikator Nazwa Czas trwania (po 7h zegarowych dziennie) Przegląd
206560 Statystyka - kurs zaawansowany 28 hours Szkolenie zawiera zaawansowane zagadnienia dotyczące statystyki. Prezentuje większość narzędzi powszechnie stosowanych w badaniach, analizie i prognozowaniu. Objaśnia teorię, która kryje się za statystycznymi wzorami. Kurs ten nie odnosi się do żadnej konkretnej dziedzinie wiedzy, ale może być "skrojony na miarę" jeżeli wszyscy uczestnicy posiadają takie same cele i oczekiwania (szczególnie w formie szkolenia zamkniętego). W trakcie szkolenia wykorzystywane są podstawowe aplikacje (zwłaszcza Excel lub OpenOffice)Praca z seriami danych dwu zmiennych Wstęp Wartość i właściwości współczynnika korelacji Pearson’a Przykłady i ćwiczenia Drugie prawo wariancji Przykłady i ćwiczenia Prawdopodobieństwo Wstęp Prawdopodobieństwo warunkowe Symulacja procesów losowych Przykłady i ćwiczenia Rozkład dwumianowy Współczynniki miarodajności Twierdzenie Bayes’a Paradoks Monty Halla Przykłady i ćwiczenia Rozkład normalny Wstęp Zastosowanie rozkładu normalnego Przykłady i ćwiczenia Standaryzacja Aproksymacja rozkładu dwumianowego rozkładem normalnym Przykłady i ćwiczenia Rozkład statystyk z próby Wstęp Przykłady i ćwiczenia Centralne Twierdzenie Graniczne Średnia, wariancja odchylenie standardowe z próby Różnica ze średnich z próby Rozkład statystyk z próby dla współczynników korelacji Obliczanie prawdopodobieństwa Przykłady i ćwiczenia Estymacja parametrów statystyk Wstęp Stopnie swobody Charakterystyki estymatorów Symulacja rozkładów Przedziały ufności dla średniej, różnic pomiędzy średnimi, korelacji Przykłady i ćwiczenia Testowanie hipotez statystycznych Wstęp Testy istotności Błędy I i II rodzaju Testy jednostronne i dwustronne Interpretowanie wyników testów Kroki w testowaniu hipotez Poziom istotności i przedziały ufności Przedziały i ćwiczenia Testowanie średnich Pojedyncza średnia Przykład rozkładu t Różnica pomiędzy średnimi – niezależne grupy Porównanie średnich względem wszystkich par Różnica między średnimi dla serii skorelowanych Porównanie średnich względem wszystkich par dla serii skorelowanych Przykłady i ćwiczenia Moc testu statystycznego Wstęp Przykłady i ćwiczenia Prognozowanie Wstęp do regresji liniowej Prognozowanie w oparciu o regresję liniową Regresja nieliniowa Średnia ruchoma Wahania okresowe Przykłady i ćwiczenia Analiza wariancji - ANOVA Wstęp ANOVA jednoczynnikowa ANOVA wieloczynnikowa, prawdopodobieństwa oraz rozkład t ANOVA dla serii o różnej długości ANOVA wewnątrz grupowa Moc analizy waraiancji Przykłady i ćwiczenia Chi kwadrat Rozkład Chi kwadrat Tabele częstości Testowanie rozkładów Tabele krzyżowe Przykłady i ćwiczenia Studium przypadku i powtórzenie wiadomości
226127 Sieci Neuronowe w R 14 hours Szkolenie jest wprowadzeniem do wdrożenia sieci neuronowych w życiu codziennym wykorzystując oprogramowanie R-project. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed
417032 Data Mining 21 hours Course can be provided with any tools, including free open-source data mining software and applicationsIntroduction Data mining as the analysis step of the KDD process ("Knowledge Discovery in Databases") Subfield of computer science Discovering patterns in large data sets Sources of methods Artificial intelligence Machine learning Statistics Database systems What is involved? Database and data management aspects Data pre-processing Model and inference considerations Interestingness metrics Complexity considerations Post-processing of discovered structures Visualization Online updating Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Use and applications Able Danger Behavioral analytics Business analytics Cross Industry Standard Process for Data Mining Customer analytics Data mining in agriculture Data mining in meteorology Educational data mining Human genetic clustering Inference attack Java Data Mining Open-source intelligence Path analysis (computing) Police-enforced ANPR in the UK Reactive business intelligence SEMMA Stellar Wind Talx Zapaday Data dredging, data fishing, data snooping
209790 Statystyka - kurs podstawowy 14 hours This course has been created for people who require general statistics skills. This course can be tailored to a specific area of expertise like market research, biology, manufacturing, public sector research, etc... Wstęp Statystyka Opisowa Wnioskowanie Statystyczne Demonstracja doboru próby Zmienne Percentyle Pomiary Rodzaje skali Demonstracja pomiarów Podstawy gromadzenia danych Rozkłady Notacja sumy Przekształcenie liniowe Ćwiczenia Wykresy Rozkładów Zmienne jakościowe Zmienne ilościowe Szeregi Rozdzielcze Histogramy Wieloboki Liczebności Wykresy Pudełkowe (Skrzynkowe) Demonstracja Wykresu Pudełkowego Wykresy Słupkowe (Kolumnowe) Wykresy Liniowe Ćwiczenia Podsumowanie Rozkładów Tendencja Centralna Co to jest Tendencja Centralna? Miara Tendencji Centralnej Rozkładu Skale Równowagi - Symulacja Różnice Bezwzględne - Symulacja Kwadraty Różnic - Symulacja Mediana i Średnia Arytmetyczna Mediana i Średnia Arytmetyczna -Symulacja Dodatkowe Miary Porównanie Miar Rozrzut Miary Rozrzutu Szacowanie Wariancji - Symulacja Kształty Rozkładu Porównanie Rozkładów- Demo Skutki Przekształceń Liniowych I Prawo Sumy Wariancji Ćwiczenia Rozkłady Normalne Historia Obszary Rozkładów Normalnych Różnice w Rozkładach Normalnych - Demonstracja Standardowy Rozkład Normalny Przybliżenie Rozkładu Dwumianowego Rozładem Normalnym Przybliżenie Rozkładem Normalnym - Demonstracja Ćwiczenia
296689 Programming with Big Data in R 21 hours Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel   Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap 
417008 Hadoop for Developers 14 hours Introduction What is Hadoop? What does it do? How does it do it? The Motivation for Hadoop Problems with Traditional Large-Scale Systems Introducing Hadoop Hadoopable Problems Hadoop: Basic Concepts and HDFS The Hadoop Project and Hadoop Components The Hadoop Distributed File System Introduction to MapReduce MapReduce Overview Example: WordCount Mappers Reducers Hadoop Clusters and the Hadoop Ecosystem Hadoop Cluster Overview Hadoop Jobs and Tasks Other Hadoop Ecosystem Components Writing a MapReduce Program in Java Basic MapReduce API Concepts Writing MapReduce Drivers, Mappers, and Reducers in Java Speeding Up Hadoop Development by Using Eclipse Differences Between the Old and New MapReduce APIs Writing a MapReduce Program Using Streaming Writing Mappers and Reducers with the Streaming API Unit Testing MapReduce Programs Unit Testing The JUnit and MRUnit Testing Frameworks Writing Unit Tests with MRUnit Running Unit Tests Delving Deeper into the Hadoop API Using the ToolRunner Class Setting Up and Tearing Down Mappers and Reducers Decreasing the Amount of Intermediate Data with Combiners Accessing HDFS Programmatically Using The Distributed Cache Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners Practical Development Tips and Techniques Strategies for Debugging MapReduce Code Testing MapReduce Code Locally by Using LocalJobRunner Writing and Viewing Log Files Retrieving Job Information with Counters Reusing Objects Creating Map-Only MapReduce Jobs Partitioners and Reducers How Partitioners and Reducers Work Together Determining the Optimal Number of Reducers for a Job Writing Customer Partitioners Data Input and Output Creating Custom Writable and Writable-Comparable Implementations Saving Binary Data Using SequenceFile and Avro Data Files Issues to Consider When Using File Compression Implementing Custom InputFormats and OutputFormats Common MapReduce Algorithms Sorting and Searching Large Data Sets Indexing Data Computing Term Frequency — Inverse Document Frequency Calculating Word Co-Occurrence Performing Secondary Sort Joining Data Sets in MapReduce Jobs Writing a Map-Side Join Writing a Reduce-Side Join Integrating Hadoop into the Enterprise Workflow Integrating Hadoop into an Existing Enterprise Loading Data from an RDBMS into HDFS by Using Sqoop Managing Real-Time Data Using Flume Accessing HDFS from Legacy Systems with FuseDFS and HttpFS An Introduction to Hive, Imapala, and Pig The Motivation for Hive, Impala, and Pig Hive Overview Impala Overview Pig Overview Choosing Between Hive, Impala, and Pig An Introduction to Oozie Introduction to Oozie Creating Oozie Workflows
238322 Przygotowanie do egzaminu CCAH (Certified Administrator for Apache Hadoop) 35 hours Kurs przeznaczony jest dla specjalistów z branży IT pracujących nad rozwiązaniami wymagającymi przechowywania i przetwarzania dużych zbiorów danych w systemach rozproszonych Cel szkolenia: zdobycie wiedzy na temat administracji systemem Apache Hadoop przygotowanie do egzaminu CCAH (Cloudera Certified Administrator for Apache Hadoop) 1: HDFS (38%) Funkcje poszczególnych daemonów systemu Apache Hadoop Przechowywanie i przetwarzanie danych w sytemie Hadoop W jakich okolicznościach powinniśmy wybrać system Hadoop Architektura i zasada działania HDFS Federacje HDFS HDFS High Availability Bezpieczeństwo HDFS (Kerberos) Proces odczytu i zapisu plików w HDFS 2: MapReduce (10%) Zasady działania MapReduce v1 Zasady działania MapReduce v2 (YARN) 3: Planowanie Klastra Systemu Hadoop (12%) Wybór sprzętu i systemu operacyjnego Analiza wymagań Dopasowywanie parametrów jądra i konfiguracji pamięci masowej Dopasowywanie konfiguracji sprzętowej do wymagań Skalowalność systemu: obciążenie procesora, pamięci operacyjnej, pamięci masowej (IO) oraz pojemności systemu Skalowalność na poziomie pamięci masowej: JBOD vs RAID, dyski sieciowe i wpływ wirtualizacji na wydajność systemu Topologie sieciowe: obiążenie sieci w systemie Hadoop (HDFS i MapReduce) i optymalizacja połączeń 4: Instalacja i Administracja Klastrem Systemu Hadoop (17%) Wpływ awarii na działanie klastra Monitorowanie logów Podstawowe metryki wykorzystywane przez klaster systemu Hadoop Narzędzia do monitorowania klastra systemu Hadoop Narzędzia do administracji klastrem systemu Hadoop 5: Zarządzanie Zasobami (6%) Architektura i funkcje kolejek Alokacja zasobów przez kolejki FIFO Alokacja zasobów przez kolejki sprawiedliwe Alokacja zasobów przez kolejki pojemnościowe 6: Monitorowanie i Logowanie (12%) Monitorowanie metryk Zarządzanie NameNodem i JobTrackerem z poziomu Web GUI Konfiguracja log4j Jak monitorować daemony systemu Hadoop Monitorowanie zurzycia CPU na kluczowych serwerach w klastrze Monitorowanie zurzycia pamięci RAM i swap Zarządzanie i przeglądanie logów Interpretacja logów 7: Środowisko Systemu Hadoop (5%) Narzędzia pomocnicze
416995 Machine Learning Fundamentals with R 14 hours The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means
463718 Wprowadzenie do Neo4j - grafowej bazy danych 7 hours Wprowadzenie do Neo4j Instalacja i konfiguracja Struktura aplikacji Neo4j Relacyjne i grafowe sposoby reprezentacji danych Model grafowy danych Czy zagadnienie można i powinno reprezentować się jako graf? Wybrane przypadki użycia i modelowanie wybranego zagadnienia Najważniejsze pojęcia modelu grafowego Neo4j: Węzeł Relacja Właściwość Etykieta Język zapytań Cypher i operacje na grafach Tworzenie i zarządzanie schematem za pomocą języka Cypher Operacje CRUD na danych Zapytania Cypher oraz ich odpowiedniki w SQL Algorytmy grafowe wykorzystywane w Neo4j Interfejs REST Podstawowe zagadnienia administracyjne Tworzenie i odtwarzanie kopii zapasowych Zarządzanie bazą z poziomu przeglądarki Import i eksport danych w uniwersalnych formatach
2985 Excel w analizie statystycznej 14 hours Kurs jes przeznaczony dla analityków, badaczy naukowych, statystyków, osób na co dzien wykorzystujących MS Excel w pracy oraz wszystkich tych którzy chcieliby poznać jakie możliwości daje analiza statystyczna w Excel. Kurs poprawia znajomość programu Excel, statystyki, analizy statystycznej oraz zwiększa efektywności i skuteczność pracy czy też prowadzonych badań. Szkolenie opisuje jak używać Analysis ToolPack w programie Microsoft Excel, funkcji statystycznych oraz jak wykonywać  procedury statystyczne. Szkolenie pokazuje ograniczenia Excela i wyjaśnia jak je przezwyciężyć.   Agregowanie danych w programie Excel Funkcje statystyczne Grupowanie Sumy częściowe Tabele przestawne Analiza Relacja danych Rozkład normalny Statystyka opisowa korelacja liniowa Analiza regresji Kowariancja Analiza danych w czasie Trendy / linii regresji Liniowa, logarytmiczna, wielomianowa, Power, wykładnicza, Przeprowadzka Wygładzanie Średni Analiza wahania sezonowe Populacje porównanie Przedział ufności dla średniej Test hipotezy o średniej populacji Różnica między średnią z dwóch populacji ANOVA: Analiza wariancji Goodness-of-Fit Test for Discrete Random Variables Test of Independence: Contingency Tables Testowanie hipotez dla wariancji dwóch populacji  prognozowanie ekstrapolacja
238323 Administrator Training for Apache Hadoop 35 hours Głównym celem szkolenia jest zdobycie wiedzy z administracji systemem Apache Hadoop w środowiskach MapReduce oraz YARN na poziomie zaawansowanym. Tematyka szkolenia dotyczy w głównej mierze architektury systemu Hadoop, a w szczególności systemu plików HDFS oraz modeli programistycznych MapReduce i YARN oraz zagadnień związanych z planowaniem, instalacją, konfiguracją, administracją, zarządzaniem i monitorowaniem klastra systemu Hadoop. Pozostałe zagadnienia związane z tematyką BigData takie jak HBase, Cassandra, Impala, Pig, Hiver oraz Sqoop są również omówione, choć pobieżnie. Kurs przeznaczony jest w głównej mierze do specjalistów z branży IT, którzy chcą przygotować się i zdać egzamin CCAH (Cloudera Certified administrator for Apache Hadoop). 1: HDFS (17%) Funkcje poszczególnych daemonów systemu Apache Hadoop Przechowywanie i przetwarzanie danych w sytemie Hadoop W jakich okolicznościach powinniśmy wybrać system Hadoop Architektura i zasada działania HDFS Federacje HDFS HDFS High Availability Bezpieczeństwo HDFS (Kerberos) Studiowanie przypadków Proces odczytu i zapisu plików w HDFS Interfejsk tekstowy HDFS 2: YARN i MapReduce w wersji 2 (MRv2) (17%): Konfiguracja YARN Wdrażanie YARN Architektura i zasada działania YARN Alokacja zasobów w YARN Przebieg wykonania zadań w YARN Migracja z MRv1 do YARN 3: Planowanie Klastra Systemu Hadoop (16%) Analiza wymagań i wybór sprzętu Analiza wymagań i wybór systemu operacyjnego Dobór parametrów jądra i konfiguracji pamięci masowej Dobór konfiguracji sprzętowej do wymagań Dobór podzespołów klastra i narzędzi pomocniczych Skalowalność systemu: obciążenie procesora, pamięci operacyjnej, pamięci masowej (IO) oraz pojemności systemu Skalowalność na poziomie pamięci masowej: JBOD vs RAID, dyski sieciowe i wpływ wirtualizacji na wydajność systemu Topologie sieciowe: obiążenie sieci w systemie Hadoop (HDFS i MapReduce) i optymalizacja połączeń 4: Instalacja i Administracja Klastrem Systemu Hadoop (25%) Wpływ awari na działanie klastra Monitorowanie logów Podstawowe metryki wykorzystywane przez klaster systemu Hadoop Narzędzia do monitorowania klastra systemu Hadoop Narzędzia pomocnicze: Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, Pig i inne Narzędzia do administracji klastrem systemu Hadoop 5: Zarządzanie Zasobami (10%) Architektura i funkcje kolejek Alokacja zasobów przez kolejki FIFO Alokacja zasobów przez kolejki sprawiedliwe Alokacja zasobów przez kolejki pojemnościowe 6: Monitorowanie i Logowanie (15%) Monitorowanie metryk Zarządzanie NameNodem i JobTrackerem z poziomu Web GUI Jak monitorować daemony systemu Hadoop Monitorowanie zużycia CPU na kluczowych serwerach w klastrze Monitorowanie zużycia pamięci RAM i swap Zarządzanie i przeglądanie logów Interpretacja logów
417026 Advanced R Programming 7 hours This course is for data scientists and statisticians that already have basic R & C++ coding skills and R code and need advanced R coding skills. The purpose is to give a practical advanced R programming course to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience R's environment Object oriented programming in R S3 S4  Reference classes Performance profiling Exception handling Debugging R code Creating R packages Unit testing C/C++ coding in R SEXPRs Calling dynamically loaded libraries from R Writing and compiling C/C++ code from R Improving R's performance with C++ linear algebra library
463936 OpenStack Administration - Basic + Intermediate (Certified System Administrator for OpenStack) 28 hours The course is dedicated to IT engineers and architects who are looking for a solution to host private or public IaaS (Infrastructure as a Service) cloud. Course goal: gaining basic knowledge regarding OpenStack design, installation and administration, preparation to the EX210 (Red Hat Certified System Administrator in Red Hat OpenStack) exam, automated and manual OpenStack cluster installation and configuration. Introduction: What is OpenStack? Foundations of Cloud Computing Virtualization vs clustering OpenStack evolution OpenStack distributions OpenStack releases OpenStack deployment solutions OpenStack services OpenStack competitors EX210 exam OpenStack Administration: Basic terms IaaS model Supported hypervisors Supported image formats Basic architecture Design concerns Installation concerns Configuration concerns Administration concerns Automation concerns Growth planning High Availability concerns Automated OpenStack installation with PackStack How to download and execute RC files How to create an external network in Neutron How to upload an image to Glance How to create a new flavor in Nova How to update default Nova and Neutron quotas How to create a new tenant in Keystone How to create a new user in Keystone How to manage roles in Keystone How to create a tenant network in Neutron How to create a router in Neutron How to manage router’s interfaces in Neutron How to update security groups in Neutron How to upload RSA key-pair to the project How to allocate floating IPs to the project How to launch an instance from image in Nova How to associate floating IPs with instances How to create a new volume in Cinder How to attach the volume to the instance How to take a snapshot of the instance How to take a snapshot of the volume How to launch an instance from snapshot in Nova How to create a volume from snapshot in Cinder How to create a container in Swift How to upload data to the container in Swift Basic Environment: Prerequisites Nodes and networks AMQP Manual installation and configuration of basic environment Keystone: Objects API concerns Components Backends Authentication process Manual Keystone Installation Manual Keystone Configuration & Administration Glance: Components Backends Manual Glance Installation Manual Glance Configuration & Administration Nova: Components Flavors Instances launching and termination process Schedulers awareness Remote access Manual Nova Installation Manual Nova Configuration & Administration Neutron: Components Network virtualization Virtual network devices L2 agent OVS ML2 Bringing it all together - Compute Bringing it all together - Networker Virtual networks L3 agent DHCP agent Manual Neutron Installation Manual Neutron Configuration & Administration Horizon: Backends Manual Horizon Installation Manual Horizon Configuration & Administration   Cinder: Volumes Components Backends Manual Cinder Installation Manual Cinder Configuration & Administration Swift: What is object storage? Replication Structure Data addressing Modified consistent hashing ring Data placement Metadata placement Part power Ring internals Ring builder Components Backends Manual Swift Installation Manual Swift Configuration & Administration Heat: Use cases Components Templates “Hello World” template Manual Heat Installation Manual Hetat Configuration & Administration Ceilometer: Use cases Basic concepts Components Polling agents Backends Manual Ceilometer Installation Manual Ceilometer Configuration & Administration Adding Compute Node: Manual addition of a Compute Node
15086 Prognozowanie Rynku 14 hours Kurs został przygotowany dla menadżerów, analityków biznesowych, przedsiębiorców, którzy chcieliby usprawnić wykorzystywane metody prognozowania, jak również dla tych, którzy dopiero rozważają ich wprowadzenie. Omówione na kursie narzędzia oraz metody mogą zostać późnij zostosowane do : prognozowania sprzedaży, ustalania planów sprzedażowych, zarządzania kanałami sprzedaży prognozowania zachowania rynku, ryzyka ekonomicznego, zmian ekonomicznych prognozowania zmian technologicznych, prognozowania zapotrzebowania produktowego, zarządzania łańcuchem dostaw Kurs ma za zadanie pokazanie uczestnikom serii narzędzi, fremeworków, metodologii oraz algorytmów, przydatnych przy próbach przewidywania przyszłości opartych o analizę danych.  Podczas kursu, uczestnicy nauczą się również zastosowania omówionych metod w standardowych narzędziach takich jak MS Excel czy oprogramowaniu OpenSource' wym - R- pakiet statystyczny. Metody oraz zasady przedstawione na kursie mogą być bez problemu zaimplementowane do każdego innego oprogramowania (np. SAS, SPSS, Statistica, MINITAB ...itp.) Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series methods Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Growth curve Econometric methods (casual methods) Regression analysis using linear regression or non-linear regression Autoregressive moving average (ARMA) Autoregressive integrated moving average (ARIMA) Econometrics Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting Reference class forecasting
238325 Administracja Hadoop 21 hours Głównym celem szkolenia jest zdobycie wiedzy z administracji systemem Apache Hadoop w środowisku MapReduce na poziomie podstawowym i średnio-zaawansowanym. Tematyka szkolenia dotyczy w głównej mierze architektury systemu Hadoop, a w szczególności systemu plików HDFS oraz modelu programistycznego MapReduce i zagadnień związanych z planowaniem, instalacją, konfiguracją i administracją klastra systemu Hadoop. Pozostałe zagadnienia związane z tematyką BigData takie jak HBase, Cassandra, Impala, YARN, Pig, Hiver oraz Sqoop są również omówione, choć pobieżnie. Kurs przeznaczony jest w głównej mierze do specjalistów z branży IT, którzy albo zamierzają podjąć pracę nad administracją systemu Hadoop, albo szukają rozwiązań do przechowywania i przetwarzania dużych zbiorów danych. Cel szkolenia: zdobycie wiedzy na temat administracji systemem Apache Hadoop Wprowadzenie do zagadnień z dziedziny Cloud Computing oraz Big Data Ewolucja systemu Apache Hadoop: HDFS, MapReduce, YARN Instalacja i konfiguracja systemu Hadoop w trybie pseudo-rozproszonym Uruchamianie programów opartych na modelu MapReduce w systemie Hadoop Planowanie, instalacja i konfiguracja klastrem Apache Hadoop Oprogramowanie pomocnicze: Pig, Hive, Sqoop, HBase Przyszłość rozwiązań Big Data: Impala, Cassandra
417013 Data Mining with R 14 hours Sources of methods Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping
463938 OpenStack Administration- Basic 14 hours The course is dedicated to IT engineers and architects who are looking for a solution to host private or public IaaS (Infrastructure as a Service) cloud. Course goal: gaining basic knowledge regarding OpenStack design, installation and administration automated OpenStack cluster installation and configuration Introduction: What is OpenStack? Foundations of Cloud Computing Virtualization vs clustering OpenStack evolution OpenStack distributions OpenStack releases OpenStack deployment solutions OpenStack services OpenStack competitors   OpenStack Administration: Basic terms IaaS model Supported hypervisors Supported image formats Basic architecture Design concerns Installation concerns Configuration concerns Administration concerns Automation concerns Growth planning High Availability concerns Automated OpenStack installation with PackStack How to download and execute RC files How to create an external network in Neutron How to upload an image to Glance How to create a new flavor in Nova How to update default Nova and Neutron quotas How to create a new tenant in Keystone How to create a new user in Keystone How to manage roles in Keystone How to create a tenant network in Neutron How to create a router in Neutron How to manage router’s interfaces in Neutron How to update security groups in Neutron How to upload RSA key-pair to the project How to allocate floating IPs to the project How to launch an instance from image in Nova How to associate floating IPs with instances How to create a new volume in Cinder How to attach the volume to the instance How to take a snapshot of the instance How to take a snapshot of the volume How to launch an instance from snapshot in Nova How to create a volume from snapshot in Cinder How to create a container in Swift How to upload data to the container in Swift   Basic Environment: Prerequisites Nodes and networks AMQP Keystone: Objects API concerns Components Backends Authentication process   Glance: Components Backends   Nova: Components Flavors Instances launching and termination process Schedulers awareness Remote access   Neutron: Components Network virtualization Virtual network devices L2 agent OVS ML2 Bringing it all together - Compute Bringing it all together - Networker Virtual networks L3 agent DHCP agent   Horizon: Backends Cinder: Volumes Components Backends   Swift: What is object storage? Replication Structure Data addressing Modified consistent hashing ring Data placement Metadata placement Part power Ring internals Ring builder Components Backends   Heat: Use cases Components Templates “Hello World” template   Ceilometer: Use cases Basic concepts Components Polling agents Backends   Adding Compute Node: Manual addition of a Compute Node  
19107 Statistics for Managers 35 hours Kurs ten został stworzony dla decydentów, których głównym celem nie jest tworzenie obliczeń i analiz, ale ich zrozumienie. Zajęcia prowadzone są z wykorzystaniem wielu zdjęć, schematów, symulacji komputerowych, anegdot, poczucia humoru aby dokładnie wyjaśnić statystyczne zagadnienia i ustrzec uczestników przed pułapkami. Introduction to Statistics What are Statistics? Importance of Statistics Descriptive Statistics Inferential Statistics Variables Percentiles Measurement Levels of Measurement Basics of Data Collection Distributions Summation Notation Linear Transformations Common Pitfalls Biased samples Average, mean or median? Misleading graphs Semi-attached figures Third variable problem Ceteris paribus Errors in reasoning Understanding confidence level Understanding Results Describing Bivariate Data Probability Normal Distributions Sampling Distributions Estimation Logic of Hypothesis Testing Testing Means Power Prediction ANOVA Chi Square Case Studies Discussion about case studies chosen by the delegates.
295219 Statistical Thinking for Decision Makers 7 hours This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them and be able to choose what kind of statistical methods are relevant in strategic planning of the organization. For example, a prospect participant needs to make decision how many samples needs to be collected before they can make the decision whether the product is going to be launched or not. If you need longer course which covers the very basics of statistical thinking have a look at 5 day "Statistics for Managers" training. What statistics can offer to Decision Makers Descriptive Statistics Basic statistics - which of the statistics (e.g. median, average, percentiles etc...) are more relevant to different distributions Graphs - significance of getting it right (e.g. how the way the graph is created reflects the decision) Variable types - what variables are easier to deal with Ceteris paribus, things are always in motion Third variable problem - how to find the real influencer Inferential Statistics Probability value - what is the meaning of P-value Repeated experiment - how to interpret repeated experiment results Data collection - you can minimize bias, but not get rid of it Understanding confidence level Statistical Thinking Decision making with limited information how to check how much information is enough prioritizing goals based on probability and potential return (benefit/cost ratio ration, decision trees) How errors add up Butterfly effect Black swans What is Schrödinger's cat and what is Newton's Apple in business Cassandra Problem - how to measure a forecast if the course of action has changed Google Flu trends - how it went wrong How decisions make forecast outdated Forecasting - methods and practicality ARIMA Why naive forecasts are usually more responsive How far a forecast should look into the past? Why more data can mean worse forecast? Statistical Methods useful for Decision Makers Describing Bivariate Data Univariate data and bivariate data Probability why things differ each time we measure them? Normal Distributions and normally distributed errors Estimation Independent sources of information and degrees of freedom Logic of Hypothesis Testing What can be proven, and why it is always the opposite what we want (Falsification) Interpreting the results of Hypothesis Testing Testing Means Power How to determine a good (and cheap) sample size False positive and false negative and why it is always a trade-off
118127 Model MapReduce w implementacji oprogramowania Apache Hadoop 14 hours Szkolenie skierowane jest do organizacji chcących wdrożyć rozwiązania pozwalające na przetwarzanie dużych zbiorów danych za pomocą klastrów. Data Mining i Bussiness Intelligence Wprowadzenie Obszary zastosowań Możliwości Podstawy eksploracji danych i odkrywania wiedzy Big data Co rozumiemy pod pojęciem Big data? Big data a Data mining MapReduce Opis modelu Przykładowe zastosowanie Statystyki Model klastra Hadoop Czym jest Hadoop Instalacja Podstawowa konfiguracja Ustawienia klastra Architektura i konfiguracja Hadoop Distributed File System Komendy i obsługa z konsoli Narzędzie DistCp MapReduce i Hadoop Streaming Administracja i konfiguracja Hadoop On Demand Alternatywne rozwiązania
463964 MATLAB Fundamental 21 hours This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include: Working with the MATLAB user interface Entering commands and creating variables Analyzing vectors and matrices Visualizing vector and matrix data Working with data files Working with data types Automating commands with scripts Writing programs with logic and flow control Writing functions Part 1 A Brief Introduction to MATLAB Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you An Example: C vs. MATLAB MATLAB Product Overview MATLAB Application Fields What MATLAB can do for you? The Course Outline Working with the MATLAB User Interface Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes. MATALB Interface Reading data from file Saving and loading variables Plotting data Customizing plots Calculating statistics and best-fit line Exporting graphics for use in other applications Va​riables and Expressions Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables. Entering commands Creating variables Getting help Accessing and modifying values in variables Creating character variables Analysis and Visualization with Vectors Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command. Calculations with vectors Plotting vectors Basic plot options Annotating plots Analysis and Visualization with Matrices Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications. Size and dimensionality Calculations with matrices Statistics with matrix data Plotting multiple columns Reshaping and linear indexing Multidimensional arrays Part 2 Automating Commands with Scripts Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical. A Modelling Example The Command History Creating script files Running scripts Comments and Code Cells Publishing scripts Working with Data Files Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats. Importing data Mixed data types Cell arrays Conversions amongst numerals, strings, and cells Exporting data Multiple Vector Plots Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data. Graphics structure Multiple figures, axes, and plots Plotting equations Using color Customizing plots Logic and Flow Control Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user. Logical operations and variables Logical indexing Programming constructs Flow control Loops Matrix and Image Visualization Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images. Scattered Interpolation using vector and matrix data 3-D matrix visualization 2-D matrix visualization Indexed images and colormaps True color images Part 3 Data Analysis Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command. Dealing with missing data Correlation Smoothing Spectral analysis and FFTs Solving linear systems of equations Writing Functions Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables. Why functions? Creating functions Adding comments Calling subfunctions Workspaces  Subfunctions Path and precedence Data Types Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized. MATLAB data types Integers Structures Converting types File I/O Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files. Opening and closing files Reading and writing text files Reading and writing binary files Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Conclusion Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Objectives: Summarise what we have learnt A summary of the course Other upcoming courses on MATLAB Note that the course might be subject to few minor discrepancies when being delivered without prior notifications.
296305 Data Science w biznesie 35 hours Data science jest nowym pojęciem które w znacznej mierze odświeża wizerunek jaki ma w powszechnym rozumieniu statystyka, a w szczególności analityka biznesowa. Nate Silver, autor bestselerowej The Signal and the Noise stwierdził, że pozwala to na bardziej „seksowne” określenie zawodu statystyka. Podobnie rzecz się ma z analityką biznesową. W całkowicie nieuzasadniony sposób zakłada się, że analitycy biznesowi nie zauważają postępu w technologii i nie rozwijają się z upływem czasu. Dla potrzeb reklamowych taki w znacznej mierze redundantny termin jak data science jest całkiem przydatny, ale pamiętajmy, że analitycy biznesowi i statystycy od dawna zajmowali się problemami które obecnie, głównie ze względu na postęp technologii, stały się tak popularne. 1. Logika analityki danych biznesowych 1.1 Powszechność możliwości wykorzystywania danych 1.2 Dwa przykłady - huragany i zachowanie klientów 1.3 Data science, inżynieria i podejmowanie decyzji bazujące na danych 1.4 Przetwarzanie danych i „Big Data” 1.5 Od Big Data 1.0 do Big Data 2.0 1.6 Dane i analityka danych jako aktywa strategiczne 1.6 Logika analityki danych - podsumowanie 2. Problemy biznesowe i rozwiązania wykorzystujące data science 2.1 Od problemu biznesowego do eksploracji danych 2.2 Metody nadzorowane i nienadzorowane 2.3 Eksploracja danych i jej wyniki 2.4 Konsekwencje zarządzania przedsięwzięciami data science 2.5 Techniki i technologie analityczne 2.6 Podsumowanie 3. Modelowanie predykcyjne - od korelacji do nadzorowanej segmentacji 3.1 Modele, indukcja i prognozowanie 3.2 Nadzorowana segmentacja 3.3 Wizualizacja wyników 3.4 Drzewa jako zbiory reguł 3.5 Estymacja prawdopodobieństwa 3.6 Analiza przypadku 3.7 Podsumowanie 4. Dopasowywanie modelu do danych 4.1 Klasyfikacja za pomocą funkcji matematycznych 4.2 Regresja 4.3 Estymacja prawdopodobieństwa klasy i „regresja” logistyczna 4.4 Funkcje nieliniowe 4.5 Sieci neuronowe 4.6 Podsumowanie 5. Nadmierne dopasowanie i sposoby jego unikania 5.1 Generalizacja 5.2 Nadmierne dopasowanie 5.3 Analiza problemu nadmiernego dopasowania 5.4 Przykłady 5.5 Techniki unikania nadmiernego dopasowania 5.6 Krzywe uczenia 5.7 Kontrola złożoności 5.8 Podsumowanie 6. Podobieństwo, sąsiedztwo i skupienia 6.1 Podobieństwo i miara odległości 6.2 Najbliższe sąsiedztwo i reguły wnioskowania 6.3 Kluczowe techniki 6.4 Analiza skupień 6.5 Zastosowania w rązwiązywaniu problemów biznesowych 7. Kiedy model jest dobry? 7.1 Klasyfikatory stosowane w ewaluacji modelu 7.2 Uogólnienia przekraczające granice klasyfikacji 7.3 Ramy analityczne 7.4 Przykłady zastosowania technik podstawowych ewaluacji 7.5 Podsumowanie 8. Wizualizacja modelu 8.1 Zastosowanie rang 8.2 Krzywe zysku 8.3 Krzywe i grafy ROC (Receiver Operating Characteristics) 8.4 Powierzchnia pod krzywą ROC 8.5 Skumulowana odpowiedź 8.6 Przykłady 8.7 Podsumowanie 9. Dowody i prawdopodobieństwa 9.1 Przykład - ukierunkowanie na klienta 9.2 Probabilistyczne połączenia dowodów 9.3 Zastosowanie reguł Bayesa 9.4 Budowanie modelu 9.5 Przykład zastosowania modelu 9.6 Podsumowanie 10. Reprezentowanie i eksploracja tekstu 10.1 Dlaczego tekst jest ważny? 10.2 Dlaczego praca z tekstem jest trudna? 10.3 Reprezentacja 10.4 Przykład 10.5 Entropia i tekst 10.6 To nie worek na słowa 10.7 Eksploracja wiadomości 10.8 Podsumowanie 11. Inżynieria analityczna - analiza przypadków 12. Inne zadania i techniki 12.1 Współwystąpienia i asocjacje 12.2 Profilowanie 12.3 Prognozowanie powiązań 12.4 Redukcja i selekcja informacji 12.5 Przekłamania, zniekształcenia i wariancja 12.6 Analiza przypadków 12.7 Podsumowanie 13. Strategia biznesowa i data science 13.1 Redux 13.2 Osiąganie przewagi konkurencyjnej 13.3 Utrzymywanie przewagi 13.4 Pozyskiwanie zasobów 13.5 Nowe idee i rozwój 13.6 Dojrzałość organizacji 14. Jak prowadzić przeglądy projektów z zakresu data science 15. Zakończenie
164951 Apache Solr - serwer wyszukiwania pełnotekstowego 14 hours Szkolenie skierowane jest do osób, które poszukują narzędzia ułatwiającego przeszukiwanie pełnotekstowe dużych zasobów danych. Wprowadzenie Apache Lucene Czym jest Solr Instalacja Shemat i analiza tekstu Modelowanie schematu Konfiguracja schema.xml Analiza tekstu Tworzenie indeksu Importowanie danych z popularnch formatów Indeksowanie dokumentów Korzystanie z API Solr Wyszukiwanie Podstawy budowy kwerend Sortowanie i filtrowanie Wykorzystanie scoringu Podstawy funkcji Obsługa rządań Formatowanie wyników wyszukiwania Faceting Zaawansowane zagadnienia Wdrażanie i konfiguracja serwera Integracja Solr z innymi bibliotekami/mechanizmami Komponenty wyszukiwania Zagadniania związane ze skalowaniem
417091 Semantic Web Overview 7 hours The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Semantic Web Overview Introduction Purpose Standards Ontology Projects Resource Description Framework (RDF) Introduction Motivation and Goals RDF Concepts RDF Vocabulary URI and Namespace (Normative) Datatypes (Normative) Abstract Syntax (Normative) Fragment Identifiers
22544 The Practitioner’s Guide to Multivariate Techniques 14 hours The introduction of the digital computer, and now the widespread availability of computer packages, has opened up a hitherto difficult area of statistics; multivariate analysis. Previously the formidable computing effort associated with these procedures presented a real barrier. That barrier has now disappeared and the analyst can therefore concentrate on an appreciation and an interpretation of the findings. Multivariate Analysis of Variance (MANOVA) Whereas the Analysis of Variance technique (ANOVA) investigates possible systematic differences between prescribes groups of individuals on a single variable, the technique of Multivariate Analysis of Variance is simply an extension of that procedure to numerous variates viewed collectively. These variates could be distinct in nature; for example Height, Weight etc, or repeated measures of a single variate over time or over space. When the variates are repeated measures over time or space, the analyses may often be reduced to a succession of univariate analyses, with easier interpretation. This procedure is often referred to as Repeated Measure Analysis. Principal Component Analysis If only two variates are recorded for a number of individuals, the data may conveniently be represented on a two-dimensional plot. If there are ‘p’ variates then one could imagine a plot of the data in ‘p’ dimensional space. The technique of Principal Component Analysis corresponds to a rotation of the axes so that the maximum amounts of variation are progressively represented along the new axes. It has been described as …….‘peering into multidimensional space, from every conceivable angle, and selecting as the viewing angle that which contains the maximum amount of variation’ The aim therefore is a reduction of the dimensionality of multivariate data. If for example a very high percentage (say 90%) of the variability is contained in the first two principal components, a plot of these components would be a virtually complete pictorial representation of the variability. Discriminant Analysis Suppose that several variates are observed on individuals from two identified groups. The technique of discriminant analysis involves calculating that linear function of the variates that best separates out the groups. The linear function may therefore be used to identify group membership simply from the pattern of variates. Various methods are available to estimate the success in general of this identification procedure. Canonical Variate Analysis Canonical Variate Analysis is in essence an extension of Discriminant Analysis to accommodate the situation where there are more than two groups of individuals. Cluster Analysis Cluster Analysis as the name suggests involves identifying groupings (or clusters) of individuals in multidimensional space. Since here there is no ‘a priori’ grouping of individuals, the identification of so called clusters is a subjective process subject to various assumptions. Most computer packages offer several clustering procedures that may often give differing results. However the pictorial representation of the so called ‘clusters’, in diagrams called dendrograms, provides a very useful diagnostic. Factor Analysis If ‘p’ variates are observed on each of ‘n’ individuals, the technique of factor analysis attempts to identify say ‘r’ (< p) so called factors which determine to a large extent the variate values. The implicit assumption here therefore is that the entire array of ‘p’ variates is controlled by ‘r’ factors. For example the ‘p’ variates could represent the performance of students in numerous examination subjects, and we wish to determine whether a few attributes such as numerical ability, linguistic ability could account for much of the variability. The difficulties here stem from the fact that the so-called factors are not directly observable, and indeed may not really exist. Factor analysis has been viewed very suspiciously over the years, because of the measure of speculation involved in the identification of factors. One popular numerical procedure starts with the rotation of axes using principal components (described above) followed by a rotation of the factors identified.
296307 Jak kłamać przy pomocy statystyki – wychwytywanie błędów i nadużyć statystyki 21 hours Nie jest, mimo prowokacyjnego tytułu, celem tego szkolenia nauczenie uczestników technik manipulacji. Podstawowy cel to nauczenie statystycznego myślenia prowadzącego do skutecznej oceny i interpretacji informacji statystycznej. W bardziej technicznej formie, statystyce poświęcone są inne szkolenia takie jak: • Podstawy statystyki z wykorzystaniem systemu R, • Analityka biznesowa w zastosowaniach, • Analityka biznesowa dla praktyków, • Eksploracja danych i analityka biznesowa z wykorzystaniem systemu R, • Wprowadzenie do systemu R – wizualizacja i analiza danych. Część I. Najprostsze sztuczki są zaskakująco skuteczne. Klasyka manipulacji statystycznej. Jak pokazać, że jest więcej, mniej lub pozostało bez zmiany na tych samych danych. Znalezione w tłumaczeniu, czyli liczby, których nie było. Jeden obraz jest wart tysiąca statystyk, albo jak zmienić znaczenie za pomocą wykresu. Is fecit cui prodest i inne, podstawowe techniki obrony. Część II. Najgorsza statystyka wszech czasów, czyli o statystyce i mediach. O źródłach. Istotność statystyki społecznej. Big Data, Big Flaws. Zła statystyka i media. Zagubione w tłumaczeniu, czyli statystyka zmutowana. Metody manipulacji liczbami. Jabłka i pomarańcze. O niewłaściwych porównaniach. Jak ceny rumu wpływają na wynagrodzenia, czyli o korelacjach, przyczynach i skutkach. Poradnik aktywisty lub jak za pomocą statystyki przyciągnąć uwagę: czy papierowe torebki w supermarketach są ekologiczne? Część III. Nie wszystko jest takie proste, jak mogłoby się wydawać. Od papierowych torebek, poprzez efektywność pracy do fotelików dla dzieci, czyli od źródeł pierwotnych do wtórnych. Krótkie wprowadzenie do rozkładów statystycznych. O korelacjach raz jeszcze. Jak nie zmanipulować samego siebie – opracowywanie danych statystycznych. Myślenie statystyczne. Co dalej i podsumowanie.
209766 Big Data Business Intelligence for Telecom & Communication Service Providers 35 hours Overview Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month. Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored. With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.) This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain. Course objectives Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following: Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective How Big Data analytic differs from legacy data analytic In-house justification of Big Data -Telco perspective Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies Network failure and service failure analytics from Network meta-data and IPDR Financial analysis-fraud, wastage and ROI estimation from sales and operational data Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization Target Audience Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office. Business Analysts in Telco CFO office managers/analysts Operational managers QA managers Breakdown of topics on daily basis: (Each session is 2 hours) Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Telco. Case Studies from T-Mobile, Verizon etc. Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI Broad Scale Application Area Network and Service management Customer Churn Management Data Integration & Dashboard visualization Fraud management Business Rule generation Customer profiling Localized Ad pushing Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when? Hadoop vs. Other NoSQL solutions For interactive, random access to data Hbase (column oriented database) on top of Hadoop Random access to data but restrictions imposed (max 1 PB) Not good for ad-hoc analytics, good for logging, counting, time-series Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access) Flume – Stream data (e.g. log data) into HDFS Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine Neural Network Big Data large variable problem -Random forest (RF) Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M Agile learning Agent based learning- Example from Telco operation Distributed learning –Example from Telco operation Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Telecom Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Customer profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Network Operation analytic- root cause analysis of network failures, service interruption from meta data, IPDR and CRM: CPU Usage Memory Usage QoS Queue Usage Device Temperature Interface Error IoS versions Routing Events Latency variations Syslog analytics Packet Loss Performance Threshold Device Traps IPDR ( IP detailed record) collection and processing Use of IPDR data for Subscriber Bandwidth consumption, Network interface utilization, modem status and diagnostic HFC information Day-3: Session-2: Tools for Network service failure analysis: Network Summary Dashboard: monitor overall network deployments and track your organization's key performance indicators Peak Period Analysis Dashboard: understand the application and subscriber trends driving peak utilization, with location-specific granularity Routing Efficiency Dashboard: control network costs and build business cases for capital projects with a complete understanding of interconnect and transit relationships Real-Time Entertainment Dashboard: access metrics that matter, including video views, duration, and video quality of experience (QoE) IPv6 Transition Dashboard: investigate the ongoing adoption of IPv6 on your network and gain insight into the applications and devices driving trends Case-Study-1: The Alcatel-Lucent Big Network Analytics (BNA) Data Miner Multi-dimensional mobile intelligence (m.IQ6) Day-3 : Session 3: Big Data BI for Marketing/Sales –Understanding sales/marketing from Sales data: ( All of them will be shown with a live predictive analytic demo ) To identify highest velocity clients To identify clients for a given products To identify right set of products for a client ( Recommendation Engine) Market segmentation technique Cross-Sale and upsale technique Client segmentation technique Sales revenue forecasting technique Day-3: Session 4: BI needed for Telco CFO office: Overview of Business Analytics works needed in a CFO office Risk analysis on new investment Revenue, profit forecasting New client acquisition forecasting Loss forecasting Fraud analytic on finances ( details next session ) Day-4 : Session-1: Fraud prevention BI from Big Data in Telco-Fraud analytic: Bandwidth leakage / Bandwidth fraud Vendor fraud/over charging for projects Customer refund/claims frauds Travel reimbursement frauds Day-4 : Session-2: From Churning Prediction to Churn Prevention : 3 Types of Churn : Active/Deliberate , Rotational/Incidental, Passive Involuntary 3 classification of churned customers: Total, Hidden, Partial Understanding CRM variables for churn Customer behavior data collection Customer perception data collection Customer demographics data collection Cleaning CRM Data Unstructured CRM data ( customer call, tickets, emails) and their conversion to structured data for Churn analysis Social Media CRM-new way to extract customer satisfaction index Case Study-1 : T-Mobile USA: Churn Reduction by 50% Day-4 : Session-3: How to use predictive analysis for root cause analysis of customer dis-satisfaction : Case Study -1 : Linking dissatisfaction to issues – Accounting, Engineering failures like service interruption, poor bandwidth service Case Study-2: Big Data QA dashboard to track customer satisfaction index from various parameters such as call escalations, criticality of issues, pending service interruption events etc. Day-4: Session-4: Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based Advertisement Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from customer churn Revenue gain from location based and other targeted Ad An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies in 2 Telco Day-5: Session 3 & 4: Review of Big Data Vendors and review of their products. Q/A session: AccentureAlcatel-Lucent Amazon –A9 APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks Huawei HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems VMware (Part of EMC) 
463778 Survey Research, Sampling Techniques & Estimation 14 hours Survey research: Principle of sample survey design and implementation  survey preliminaries sampling methods (probability & non-probability methods) population & sampling frames survey data collection methods Questionnaire design Design and writing of questionnaires Pre-tests & piloting Planning & organisation of surveys Minimising errors, bias & non-response at the design stage  Survey data processing Commissioning surveys/research  Sample Techniques & Estimation: Sampling techniques and their strengths/weaknesses (may overlap above sampling methods) Simple Random Sampling Unequal Probability Sampling Stratified Sampling (with proportional to size & disproportional selection) Systematic Sampling Cluster sampling Multi-stage Sampling  Quota Sampling Estimation Methods of estimating sample sizes  Estimating population parameters using sample estimates Variance and confidence intervals estimation Estimating bias/precision  Methods of correcting bias Methods of handling missing data Non-response analysis
62104 Minitab dla Statystyków i Analityków 14 hours The course is aimed at anyone interested in statistical analysis. It provides familiarity with Minitab and will increase the effectiveness and efficiency of your data analysis and improve your knowledge of statistics. Wstęp Praca z arkuszem Minitab jako arkusz kalkulacyjny (podobieństwa do Excela), import i eksport danych operacje na danych dodatkowe możliwości związane z arkuszami w programie Minitab Wykresy proste wykresy linie trendu i regresja liniowa Statystyka opisowa dopasowanie rozkładów (rozkład normalny, Weibulla i inne) sprawdzanie normalności rozkładów Pareto korelacja testowanie hipotez Analiza wariancji ANOVA Ocena jakości procesu karty kontrolne testy wydajności Projektowanie eksperymentów Generowanie raportów Podsumowanie możliwości programu Minitab
417022 Applied Machine Learning 14 hours This training course is for people that would like to apply Machine Learning in practical applications. Audience This course is for data scientists and statisticians that have some familiarity with statistics and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give practical applications to Machine Learning to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience. Naive Bayes Multinomial models Bayesian categorical data analysis Discriminant analysis Linear regression Logistic regression GLM EM Algorithm Mixed Models Additive Models Classification KNN Bayesian Graphical Models Factor Analysis (FA) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Support Vector Machines (SVM) for regression and classification Boosting Ensemble models Neural networks Hidden Markov Models (HMM) Space State Models Clustering
209771 IoT (Internet of Things) for Entrepreneurs, Managers and Investors 21 hours Estimates for Internet of Things or IoT market value are massive, since by definition the IoT is an integrated and diffused layer of devices, sensors, and computing power that overlays entire consumer, business-to-business, and government industries. The IoT will account for an increasingly huge number of connections: 1.9 billion devices today, and 9 billion by 2018. That year, it will be roughly equal to the number of smartphones, smart TVs, tablets, wearable computers, and PCs combined.  In the consumer space, many products and services have already crossed over into the IoT, including kitchen and home appliances, parking, RFID, lighting and heating products, and a number of applications in Industrial Internet.  However the underlying technologies of IoT are nothing new as M2M communication existed since the birth of Internet. However what changed in last couple of years is the emergence of number of inexpensive wireless technologies added by overwhelming adaptation of smart phones and Tablet in every home. Explosive growth of mobile devices led to present demand of IoT.  Due to unbounded opportunities in IoT business, a large number of small and medium sized entrepreneurs jumped into bandwagon of IoT gold rush. Also due to emergence of open source electronics and IoT platform, cost of development of IoT system and further managing its sizeable production is increasingly affordable. Existing electronic product owners are experiencing pressure to integrate their device with Internet or Mobile app.  This training is intended for a technology and business review of an emerging industry so that IoT enthusiasts/entrepreneurs can grasp the basics of IoT technology and business. Course objectives  Main objective of the course is to introduce emerging technological options, platforms and case studies of IoT implementation in home & city automation (smart homes and cities), Industrial Internet, healthcare, Govt., Mobile Cellular and other areas.  Basic introduction of all the elements of IoT-Mechanical, Electronics/sensor platform, Wireless and wireline protocols, Mobile to Electronics integration, Mobile to enterprise integration, Data-analytics and Total control plane.  M2M Wireless protocols for IoT- WiFi, Zigbee/Zwave, Bluetooth, ANT+ : When and where to use which one?  Mobile/Desktop/Web app- for registration, data acquisition and control –Available M2M data acquisition platform for IoT-–Xively, Omega and NovoTech, etc. Security issues and security solutions for IoT Open source/commercial electronics platform for IoT-Rasberry Pi, Adruino , ArmMbedLPC etc  Open source /commercial enterprise cloud platform for IoT-Ayla, iO Bridge, Libellium, Axeda, Cisco frog cloud Studies of business and technology of some of the common IoT devices like Home automation, Smoke alarm, vehicles, military, home health etc. Target Audience  Investors and IoT entrepreneurs  Managers and Engineers whose company is venturing into IoT space  Business Analysts & Investors 1. Day-1: Session -1: Business Overview of Why IoT is so important  Case Studies from Nest, CISCO and top industries  IoT adaptation rate in North American & and how they are aligning their future business model and operation around IoT  Broad Scale Application Area  Smart house and smart city  Industrial Internet  Smart Cars  Home healthcare  Business Rule generation for IoT  3 layered architecture of Big Data –Physical (Sensors), Communication and Data Intelligence 2. Day-1: Session-2 : Introduction of IoT: All about Sensors  Basic function and architecture of a sensor –Sensor body, sensor mechanism, sensor calibration, sensor maintenance, cost and pricing structure, legacy and modern sensor network- All basics about the sensors  Development of sensor electronics- IoT vs legacy and open source vs traditional PCB design style Development of Sensor communication protocols –history to modern days. Legacy protocols like Modbus, relay, HART to modern day Zigbee, Zwave, X10,Bluetooth, ANT etc..  Business driver for sensor deployment- FDA/EPA regulation, Fraud/tempering detection, supervision, Quality control and process management  Different Kind of Calibration Techniques-manual, automation, infield, primary and secondary calibration –their implication in IoT Powering options for sensors-Battery, solar, Witricity. Mobile and PoE 3. Day-1 : Session -3 : Introduction to Sensor Network and Wireless protocol  What is a sensor network?  Wireless vs. Wireline network  WiFi- 802.11 families: N to S- application of each standards and common vendors.  Zigbee and Zwave-advantage of low power mesh networking. Long distance Zigbee. Introduction to different Zigbee chips:  Bluetooth/BLE: Low power vs high power, speed of detection, class of BLE. Introduction of Bluetooth vendors & their review :  X10, ANT+  Other long distance RF communication link  LOS vs NLOS links  Capacity and throughput calculation  Application issues in wireless protocols- power consumption, reliability, PER, QoS, LOS 4. Day-1 : Session-4 : Review of Electronics Platform, production and cost projection  PCB vs FPGA vs ASIC design-how to take decision  Prototyping electronics vs Production electronics  QA certificate for IoT- CE/CSA/UL/IEC/RoHS/IP65: What are those and  Basic introduction of multi-layer PCB design and its workflow  Electronics reliability-basic concept of FIT and early mortality rate  Environmental and reliability testing-basic concepts  Basic Open source platforms: Adruino, Rasberry Pi, Beaglebone, when needed?  RedBack, Diamond Back 5. Day-2: Session-1: Conceiving a new IoT product- Product requirement document for IoT  State of the present art and review of existing technology in the market place Suggestion for new features and technologies based on market analysis and patent issues  Detailed technical specs for new products- System, software, hardware, mechanical, installation etc.  Packaging and documentation requirements  Servicing and customer support requirements  High level design (HLD) for understanding of product concept  Release plan for phase wise introduction of the new features  Skill set for the development team and proposed project plan -cost & duration  Target manufacturing price 6. Day-2: Session-2: Introduction to Mobile app platform for IoT  Protocol stack of Mobile app for IoT  Mobile to server integration –what are the factors to look out  What are the intelligent layer that can be introduced at Mobile app level ? iBeacon in IoS  Window Azure  Linkafy Mobile platform for IoT  Axeda  Xively 7. Day-2: Session-3: Machine learning for intelligent IoT  Introduction to Machine learning  Learning classification techniques  Bayesian Prediction-preparing training file  Support Vector Machine  Image and video analytic for IoT  Fraud and alert analytic through IoT  Bio –metric ID integration with IoT  Real Time Analytic/Stream Analytic  Scalability issues of IoT and machine learning  What are the architectural implementation of Machine learning for IoT   8. Day-2: Session-4 Analytic Engine for IoT  Insight analytic  Visualization analytic  Structured predictive analytic Unstructured predictive analytic  Recommendation Engine  Pattern detection  Rule/Scenario discovery –failure, fraud, optimization  Root cause discovery  9. Day-3 : Sesion-1 : Security in IoT implementation  Why security is absolutely essential for IoT  Mechanism of security breach in IOT layer  Privacy enhancing technologies Fundamental of network security  Encryption and cryptography implementation for IoT data  Security standard for available platform  European legislation for security in IoT platform  Secure booting  Device authentication  Firewalling and IPS  Updates and patches  10. Day-3 : Sesion-2 : Database implementation for IoT : Cloud based IoT platforms  SQL vs NoSQL-Which one is good for your IoT application  Open sourced vs. Licensed Database  Available M2M cloud platform Axeda  Xively  Omega  NovoTech  Ayla  Libellium  CISCO M2M platform  AT &T M2M platform  Google M2M platform  11. Day-3, Session-3: A few common IoT systems  Home automation  Energy optimization in Home  Automotive-OBD  IoT-Lock  Smart Smoke alarm BAC ( Blood alcohol monitoring ) for drug abusers under probation  Pet cam for Pet lovers  Wearable IOT  Mobile parking ticketing system Indoor location tracking in Retail store  Home health care  Smart Sports Watch  12. Day-3: Session-4: Big Data for IoT  4V- Volume, velocity, variety and veracity of Big Data  Why Big Data is important in IoT  Big Data vs legacy data in IoT  Hadoop for IoT-when and why?  Storage technique for image, Geospatial and video data Distributed database  Parallel computing basics for IoT
463779 Data Shrinkage for Government 14 hours Why shrink data Relational databases Introduction Aggregation and disaggregation Normalisation and denormalisation Null values and zeroes Joining data Complex joins Cluster analysis Applications Strengths and weaknesses Measuring distance Hierarchical clustering K-means and derivatives Applications in Government Factor analysis Concepts Exploratory factor analysis Confirmatory factor analysis Principal component analysis Correspondence analysis Software Applications in Government Predictive analytics Timelines and naming conventions Holdout samples Weights of evidence Information value Scorecard building demonstration using a spreadsheet Regression in predictive analytics Logistic regression in predictive analytics Decision Trees in predictive analytics Neural networks Measuring accuracy Applications in Government
417104 Six Sigma Yellow Belt 21 hours Yellow Belt covers the basics of the Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling delegates to take part and lead team based waste and defect reduction projects and initiatives. In addition emphasis is placed on applying the problem solving tools into daily roles. At the end of the course you will be equipped to look at your immediate team and role, determine what can be improved and create a business improvement project on a selected opportunity that is aligned to customer requirements. You will be able to analyse the process using visualization tools and identify the waste (non-value adding) components and work to eliminate these from the process. You will apply root cause analysis techniques to identify the underlying causes of defects in the process. The course uses simulations, case study exercises and work based projects to enable delegates to 'learn through doing'. Notes: This course has a minimum class size of 4. And if requested this course can be delivered in 2 days with some reductions to the course content and level of detail in some areas, notably Customer needs; Graphical analysis and Process handover. An overview of project selection and scoping Understanding customer needs and how they impact project aims Discovering processes using visualisation techniques Understanding the causes of work and how to simplify Finding and removing process waste Graphical analysis to understand process performance Problem solving tools to determine root cause Basic solution creation Piloting & implementation Process handover
295297 Podstawy systemów rekomendacyjnych 7 hours Szkolenie skierowane jest dla pracowników działów marketingu oraz leaderów działów IT. Problemy i nadzieje związane z gromadzeniem danych Information overload Rodzaje gromadzonych danych Potencjał danych dziś i jutro Podstawowe pojęcia związane z Data Mining Rekomendacja a wyszukiwanie Wyszukiwanie i filtrowanie Sortowanie Określanie wag wyników Wykorzystanie synonimów Wyszukiwanie pełnotekstowe Koncepcja Long Tail Idea Chrisa Andersona Argumenty przeciwników koncepcji Long Tail; Argumentacja Anity Elberse Próba określenia podobieństw Produkty Użytkownicy Dokumenty i strony internetowe Content-Based Recomendation i miary podobieństw Odległość cosinusowa Odległość euklidesowa wektórów TFIDF i pojęcie częstości występowania termów Collaborative filtering Rekomendacja na podstawie ocen społeczności Wykorzystanie grafów Możliwości grafów Określanie podobieństwa grafów Rekomendacja na podstawie relacji pomiędzy użytkownikami Sieci neuronowe Zasada działania Dane wzorcowe Przykładowe zastosowanie sieci neuronowych dla systemów rekomendacyjnych HR Zachęcanie użytkowników do udostępniania informacji Wygoda działania serwisu Ułatwienia nawigacji Funkcjonalność i UX Systemy rekomendacyjne na świecie Problemy i popularrność systemów rekomendacyjnych Udane wdrożenia systemów rekomendacyjncyh Przykłady na podstawie popularnych serwisów
463780 Statistical and Econometric Modelling 21 hours The Nature of Econometrics and Economic Data Econometrics and models Steps in econometric modelling Types of economic data, time series, cross-sectional, panel Causality in econometric analysis Specification and Data Issues Functional form Proxy variables Measurement error in variables Missing data, outliers, influential observations Regression Analysis Estimation Ordinary least squares (OLS) estimators Classical OLS assumptions, Gauss Markov-Theorem Best Linear Unbiased Estimators Inference Testing statistical significance of parameters t-test(single, group) Confidence intervals Testing multiple linear restrictions, F-test Goodness of fit Testing functional form Missing variables Binary variables Testing for violation of assumptions and their implications: Heteroscedasticity Autocorrelation Multicolinearity Endogeneity Other Estimation techniques Instrumental Variables Estimation Generalised Least Squares Maximum Likelihood Generalised Method of Moments Models for Binary Response Variables Linear Probability Model Probit Model Logit Model Estimation Interpretation of parameters, Marginal Effects Goodness of Fit Limited Dependent Variables Tobit Model Truncated Normal Distribution Interpretation of Tobit Model Specification and Estimation Issues Time Series Models Characteristics of Time Series Decomposition of Time Series Exponential Smoothing Stationarity ARIMA models Co-Integration ECM model Predictive Analysis Forecasting, Planning and Goals Steps in Forecasting Evaluating Forecast Accuracy Redisual Diagnostics Prediction Intervals
417023 Introduction to Machine Learning 7 hours This training course is for people that would like to apply basic Machine Learning techniques in practical applications. Audience Data scientists and statisticians that have some familiarity with machine learning and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give a practical introduction to machine learning to participants interested in applying the methods at work Sector specific examples are used to make the training relevant to the audience. Naive Bayes Multinomial models Bayesian categorical data analysis Discriminant analysis Linear regression Logistic regression GLM EM Algorithm Mixed Models Additive Models Classification KNN Ridge regression Clustering
296356 Fundamentals of Cassandra DB 21 hours This course introduces the basics of Cassandra 2.0 including its installation & configuration, internal architecture, tools, Cassandra Query Language, and administration. Audience Administrators and developers seeking to use Cassandra. This course serves as a foundation and prerequisite for other advanced Cassandra courses.   Introduction to Cassandra Big Data Common use cases of Cassandra Cassandra architecture Installation and Configuration Running and Stopping Cassandra instance Cassandra Data Model Cassandra Query Language Configuring the Cassandra nodes and clusters using CCM cqlsh shell commands nodetool Using cassandra-stress to populate and test the Cassandra nodes Coordinating the Cassandra requests Replication Consistency Tuning Cassandra Nodes Communication Writing and Reading data to/from the storage engine Data directories Anti-entropy operations Cassandra Compaction Choosing and Implementing compaction strategies Best practices in hardware planning Troubleshooting resources
165011 Analiza statystyczna w badaniach rynku 28 hours Cel: Doskonalenie  warsztatu badacza zachowań konsumentów produktów i usług  Adresaci:  Badacze, Analitycy rynku, Menedżerowie i pracownicy działów marketingu, działów sprzedaży przede wszystkim branży farmaceutycznej i FMCG, studenci kierunków społeczno-ekonomicznych oraz wszyscy zainteresowani badaniami rynku Moduł 1. Badania ilościowe Wstępna obróbka wyników kontrola poprawności  bazy danych kontrola braków danych ważenie obserwacji Modele statystyczne regresja wieloraka analiza conjoint drzewa klasyfikacyjne Automatyzacja procedur w badaniach trackingowych Analiza danych z eksperymentu marketingowego Raport i formułowanie wniosków  Moduł 2. Badania jakościowe  Transformacja danych jakościowych do postaci ilościowej Modele statystyczne dla danych jakościowych  
417025 Numerical Methods 14 hours This course is for data scientists and statisticians that have some familiarity with numerical methods and have at least one programming language from R, Python, Octave, and some C++ options. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose of this course is to give a practical introduction in numerical methods to participants interested in applying the methods at work.  Sector specific examples are used to make the training relevant to the audience. Topics Covered: curve fitting regression robust regression linear algebra: matrix operations eigenvalue/eigenvectormatrix decompositions ordinary & partial differential equations fourier analysis interpolation & splines
417006 Hadoop Administration on MapR 28 hours Audience: IT professionals who aspire to get involved in the 'Big Data' world or require knowledge of open source NoSQL solutions. This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand. Big Data Overview: What is Big Data Why Big Data is gaining popularity Big Data Case Studies Big Data Characteristics Solutions to work on Big Data. Hadoop & Its components: What is Hadoop and what are its components. Hadoop Architecture and its characteristics of Data it can handle /Process. Brief on Hadoop History, companies using it and why they have started using it. Hadoop Frame work & its components- explained in detail. What is HDFS and Reads -Writes to Hadoop Distributed File System. How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster. (This includes setting up a Hadoop cluster in VM BOX/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster). What is Map Reduce frame work and how it works. Running Map Reduce jobs on Hadoop cluster. Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters. Hadoop Cluster Planning:   How to plan your hadoop cluster.   Understanding hardware-software to plan your hadoop cluster.   Understanding workloads and planning cluster to avoid failures and perform optimum. What is MapR and why MapR :  Overview of MapR and its architecture. Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors. Planning a cluster in context of MapR. Comparison of MapR with other distributions and Apache Hadoop. MapR installation and cluster deployment. Cluster Setup & Administration: Managing services, nodes ,snapshots, mirror volumes and remote clusters. Understanding and managing Nodes. Understanding of Hadoop components, Installing Hadoop components alongside MapR Services. Accessing Data on cluster including via NFS Managing services & nodes. Managing data by using volumes,  managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security. Understanding and working with M7- Native storage for MapR tables. Cluster configuration and tuning for optimum performance. Cluster upgrade and integration with other setups: Upgrading software version of MapR and types of upgrade. Configuring Mapr cluster to access HDFS cluster. Setting up MapR cluster on Amazon Elastic Mapreduce. All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.
165012 Statystyka zaawansowana z wykorzystaniem SPSS Predictive Analytics SoftWare. 28 hours Cel: Opanowanie umiejętności samodzielnej pracy z programem SPSS na zaawansowanym poziomie wykorzystania ;okien dialogowych oraz języka poleceńsyntaxw zakresie wybranych technik analitycznych. Adresaci: Analitycy, Badacze, Naukowcy, studenci i wszyscy, którzy chcą pozyskać umiejętność posługiwania się pakietem SPSS na poziomie zaawansowanym i oraz poznać wybrane modele statystyczne. Szkolenie podejmuje uniwersale zagadnienia analityczne i nie jest dedykowany konkretnej branży. Przygotowanie bazy danych do analizy zarządzanie zbiorem danych operacje na zmiennych wybrane funkcje przekształcające zmienne (logarytmiczna, potęgowa, itp.) Statystyki parametryczne i nieparametryczne, czyli jak dopasować model do danych skala pomiarowa typ rozkładu obserwacje odstające i wpływowe (outliers) liczebność próby centralne twierdzenie graniczne Badanie różnic między cechami statystycznymi testy oparte na średniej i medianie Analiza współzależności i podobieństwa korelacje analiza składowych głównych analiza skupień Predykcja – analiza regresji jedno- i wielozmiennowej metoda najmniejszych kwadratów modele liniowe i nieliniowe zmienne instrumentalne w modelach regresyjnych (dummy, effect, orthogonal coding) Wnioskowanie statystyczne
417094 Six Sigma Black Belt 84 hours Six Sigma is a data driven approach that tackles variation to improve the performance of products, services and processes, combining practical problem solving and the best scientific approaches found in experimentation and optimisation of systems. The approach has been widely and successfully applied in industry, notably by Motorola, AlliedSignal & General Electric. Black Belt is a qualification for improvement managers in a Six Sigma organisation. You will learn the tools and techniques to take an improvement project through the Define, Measure, Analyse, Improve and Control phases (DMAIC). These techniques include Process Mapping, Measurement System Evaluation, Regression Analysis, Design of Experiments, Statistical Tolerancing, Monte Carlo Simulation and Lean Thinking. The content of the course takes the participants through the DMAIC phases as well as introducing subjects such as Lean Thinking, Design for Six Sigma and discussing important leadership issues and experiences in deploying a Six Sigma programme. Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives. Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function. Week 3 Expert: provides regression, design of experiment and data analysis techniques to enable participants to tackle complex problem solving projects that require understanding of the relationships between multiple variables. The trainer has 16 years experience with Six Sigma and as well as leading the deployment of Six Sigma at a number of businesses he has trained and coached over 300 Black Belts. Here are a few comments from previous participants: “Probably the most valuable course I will ever pass” “The content was very well delivered. The examples very relevant. Thank you” “The course was excellent and I am able to use part of it to coach my lean teams here” (Company supervisor who attended with KTP associate) Block 1 Day 1 Introduction to Six Sigma Project Chartering & VOC Process Mapping Stakeholder analysis Day 2 Team Start Up Prioritisation Matrix Lean Thinking Value Stream Mapping Day 3 Data Collection Minitab and Graphical Analysis Descriptive Statistics Day 4 Measurement System Evaluation Process Capability Cp, CpK Six Sigma Metrics Day 5 5 Why FMEA Block 2 Day 1 Review of Block 1 Multivari Inferential Statistics Intro to Hypothesis Testing Day 2 2 sample t-tests F tests Hypothesis Testing – Chi Sq Day 3 Hypothesis Testing - Anova Day 4 Correlation and Regression Multiple Regression Introduction to Design Of Experiments Day 5 Mistake Proofing Control Plans Control Charts Block 3 Day 1 Review of Block 2 2K Factorial Experiments Box Cox Transformations Hypothesis Testing – Non Parametric Day 2 2K Factorial Experiments Fractional Factorial Experiments Day 3 Noise Blocking Robustness Centre Points General Full Factorial Experiments Day 4 Response Surface Experiments Implementing Improvements Creative Solutions Day 5 Intro to Design for Six Sigma Statistical Tolerancing Monte Carlo Simulation Certification Six Sigma is a practical qualification, to demonstrate knowledge of what has been learnt on the course you will need to undertake 2 coursework projects. There is no report to produce but you will be required to present a PowerPoint presentation to the trainer and examiner showing results and method. The projects can cover work you would complete in your normal work, however you will need to show use of the DMAIC problem solving approach and application of Six Sigma and Lean tools. This provides a good balance between the practical approach and more rigorous analysis which together lead to robust solutions. You will be able to contact the trainer for discussions of how Six Sigma tools could benefit you in your project. Examples of projects from previous participants include: Formulating cream texture for seasonality in dairy feeds. Housing Association complaints reduction Multi-variable (cost, efficiency, size) optimisation of a fuel cell Job Scheduling improvement in a factory Ambulance waiting time reduction Reduction in resin thickness variation in glass manufacture NobleProg & Redlands provide Black Belt certification. For delegates that require independent accreditation, NobleProg & Redlands have partnered with the British Quality Foundation (BQF) to provide Lean Six Sigma Black Belt certification. Certification requires passing an exam at the end of the course and completing and presenting two improvement projects that demonstrate understanding and application of the Six Sigma approach and techniques. An additional charge of £600 plus VAT is levied for BQF independent accreditation.
417007 Apache Spark 14 hours Why Spark? Problems with Traditional Large-Scale Systems Introducing Spark Spark Basics What is Apache Spark? Using the Spark Shell Resilient Distributed Datasets (RDDs) Functional Programming with Spark Working with RDDs RDD Operations Key-Value Pair RDDs MapReduce and Pair RDD Operations The Hadoop Distributed File System Why HDFS? HDFS Architecture Using HDFS Running Spark on a Cluster Overview A Spark Standalone Cluster The Spark Standalone Web UI Parallel Programming with Spark RDD Partitions and HDFS Data Locality Working With Partitions Executing Parallel Operations Caching and Persistence RDD Lineage Caching Overview Distributed Persistence Writing Spark Applications Spark Applications vs. Spark Shell Creating the SparkContext Configuring Spark Properties Building and Running a Spark Application Logging Spark, Hadoop, and the Enterprise Data Center Overview Spark and the Hadoop Ecosystem Spark and MapReduce Spark Streaming Spark Streaming Overview Example: Streaming Word Count Other Streaming Operations Sliding Window Operations Developing Spark Streaming Applications Common Spark Algorithms Iterative Algorithms Graph Analysis Machine Learning Improving Spark Performance Shared Variables: Broadcast Variables Shared Variables: Accumulators Common Performance Issues
165013 Statystyka z SPSS Predictive Analytics SoftWare 14 hours Cel: Opanowanie umiejętności pracy z programem SPSS na poziomie samodzielności Adresaci: Analitycy, Badacze, Naukowcy, studenci i wszyscy, którzy chcą pozyskać umiejętność posługiwania się pakietem SPSS oraz poznać popularne techniki eksploracji danych. Obsługa programu Okna dialogowe wprowadzanie/wczytywanie danych pojęcie zmiennej i skale pomiarowe przygotowanie bazy danych generowanie tabel i wykresów formatowanie raportu Język poleceńsyntax automatyzacja analiz zapisywanie i modyfikacja procedur tworzenie własnych procedur analitycznych Analiza danych Statystyki opisowe kluczowe terminy: m.in. zmienna, hipoteza, istotność statystyczna miary tendencji centralnej miary dyspersji rozkłady cech statystycznych standaryzacja Wprowadzenie do badania zależności między zmiennymi metody korelacyjne a eksperymentalne Podsumowanie: analiza przypadku i omówienie
417095 Six Sigma Green Belt 70 hours Green Belts participate in and lead Lean and Six Sigma projects from within their regular job function. They can tackle projects as part of a cross functional team or projects scoped within their normal job. Each session of Green Belt training is separated by 3 or 4 weeks when the Green Belts apply their training to their improvement projects. We recommend supporting the Green Belts on their projects in between training sessions and holding stage gate reviews along with leadership and Lean Six Sigma Champions to ensure DMAIC methodology is being rigorously applied. Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives. Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function. Block 1 Day 1 Introduction to Six Sigma Project Chartering & VOC Process Mapping Stakeholder analysis Day 2 Team Start Up Prioritisation Matrix Lean Thinking Value Stream Mapping Day 3 Data Collection Minitab and Graphical Analysis Descriptive Statistics Day 4 Measurement System Evaluation Process Capability Cp, CpK Six Sigma Metrics Day 5 5 Why FMEA Block 2 Day 1 Review of Block 1 Multivari Inferential Statistics Intro to Hypothesis Testing Day 2 2 sample t-tests F tests Hypothesis Testing – Chi Sq Day 3 Hypothesis Testing - Anova Day 4 Correlation and Regression Multiple Regression Introduction to Design Of Experiments Day 5 Mistake Proofing Control Plans Control Charts
417024 Apache Mahout for Developers 14 hours Audience Developers involved in projects that use machine learning with Apache Mahout.   Format Hands on introduction to machine learning. The course is delivered in a lab format based on real world practical use cases.   Implementing Recommendation Systems with Mahout Introduction to recommender systems Representing recommender data Making recommendation Optimizing recommendation Clustering Basics of clustering Data representation Clustering algorithms Clustering quality improvements Optimizing clustering implementation Application of clustering in real world Classification Basics of classification Classifier training Classifier quality improvements
165107 Xcelsius 14 hours Opis: Na szkoleniu Xcelsius , kursanci będą używać Xcelsius do stworzenia interaktywnych wizualizacji do prezentowania złożonych danych w prosty sposób, oraz przeprowadzenia analizy w celu podejmowania kluczowych decyzji. Kursanci będą również tworzyć pulpity nawigacyjne, które bedą prezentować firmę, projekty, informacje o zasobach ludzkich, wszystkie skonsolidowane i przedstawione w sposób przyjazny dla użytkownika. Na koniec, kursanci będą umieli publikować pulpity nawigacyjne do róznych formatów, takich jak Adobe Flash, Microsoft Office PowerPoint, Adobe PDF, a także do sieci. Cele: Po pomyślnym ukończeniu kursu, uczestnicy będą mogli: Odkryć obszar roboczy Xcelsius i już istniejący pulpit nawigacyjny. Tworzyć proste wizualizacje Przeprowadzac analizę danych za pomocą elementów Xcelsius, które nadają dynamiczną funkcjonalność  dla określonych danych Stworzyć pulpit nawigacyjny zarządzania projektami. Stworzyć pulpit nawigacyjny do konsolidowania i prezentowania informacji o zasobach ludzkich organizacji Zakończyć pulpity i eksportować je do różnych formatów plików. Odbiorcy: Ten kurs przeznaczony jest dla profesjonalistów, którzy przeprowadzają analizy danych i potrzebują przedstawić solidne i terminowe dane w interaktywnej wystawie.     1: Pierwsze kroki z Xcelsius 1A: Poznaj interfejs Xcelsius 1B: Poznaj pulpit nawigacyjny 2: Tworzenie prostych i interaktywnych wizualizacji 2A: Tworzenie prostego wykresu Xcelsius  2B: Zarządzanie osobistymi finansami za pomocą pola wartości 2C: Organizowanie poziomów informacji za pomocą filtrów 2D:Przeprowadzenie analizy porównawczej korzystająć z kreatora list i wykresu liniowego 3:Przeprowadzenie analizy danych 3A: Przeprowadzenie analizy trendów używając combo box 3B: Przeprowadzanie analizy popytu za pomocą menu opartego na etykietach 3C: Przeprowadzanie analizy popytu bazującej na regionie za pomocą map. 3D:Prognoza dochodów za pomocą suwaków oraz ustalenie skali. 4:Tworzenie pulpitu nawigacyjnego zarządzania projektami 4A: Uzyskiwanie szczegółowego stanu bieżących projektów przy użyciu funkcji drążenia. 4B: Analiza efektywności zasobów za pomocą menu Fisheye Picture i innych narzędzi 4C: Analiza wykorzystania zasobów korzystając z wykresu złożonego 5: Tworzenie pulpitu zasobów ludzkich 5A: Tworzenie pulpitu za pomocą schematu organizacyjnego 5B: Przeprowadzenie analizy utraty klientów 6: Finalizacja pulpitów 6A: Tworzenie skrótów 6B: Publikowanie paneli
463701 Octave for Data Analysis 14 hours Audience: This course is for data scientists and statisticians that have some familiarity statistical methods and would like to use the Octave programming language at work. The purpose of this course is to give a practical introduction in Octave programming to participants interested in using this programming language at work.   environment data types: numeric string, arrays  matrices variables  expressions  control flow  functions exception handling  debugging input/output  linear algebra  optimization statistical distributions  regression plotting
417030 Programowanie w języku F# 7 hours Szkolenie skierowane jest do programistów, analityków oraz osób chcących poznać podstawy i możliwości języka F# w oparciu o platformę .NET.Wprowadzenie Co to jest język F# i jakie możliwości daje platforma .NET Instalacja F# i IDE Korzystanie z konsoli REPL Tworzenie i uruchamianie pierwszego programu Programowanie funkcyjne Paradygmat i filary programowania obiektowego Paradygmat programowania funkcyjnego Pojęcie stanu i czasu w obydwu paradygmatach Pojęcie funkcji Typ pierwszoklasowy (obywatele pierwszej kategorii) Domknięcia, lambdy, funkcje anonimowe Rekurencja Typy danych w programowaniu funkcyjnym Podstawowe konstrukcje języka F# Wartości F# i przypisywanie im nazw Niezmienność wartości Operatory Podstawowe informacje o funkcjach Sterowanie przebiegiem programu Literały Typy danych Funkcje Argumenty, hermetyzacja i zwracanie wartości Rozwiązywanie problemów za pomocą rekurencji Korzystanie z domknięć, lambd i funkcji anonimowych Currying Elementy programowania obiektowego Klasy Właściwości Metody Kompozycja i delegacja F# w praktyce Praca ze zbiorami danych i wizualizacja Obliczenia finansowe Integracja z bibliotekami F# Testowanie
206532 Statystyka dla Naukowców 35 hours The course aims to give researchers to understand principles of statistical design and analysis and their relevance to research in a range of scientific disciplines. It covers some probability and statistical methods, mainly through examples. Training contain around 30% of lectures, 70% of guided quizzes and labs. In case of closed course we can tailor the examples and materials to a specific branch (like psychology tests, public sector, biology, genetics, etc...) In case of public courses mixed examples are used. Though various software is used during this course (Microsoft Excel to SPSS, Statgraphs, etc...) its main focus is on understanding principles and processes guding research, reasoning and conclusion. This course can be delivered as a blended course i.e. with homework and assignments. Scientific Method, Probability & Statistics Very short history of statistics Why can be "confident" about the conclusions Probability and decision making Preparation for research (deciding "what" and "how") The big picture: research is a part of a process with inputs and outputs Gathering data Questioners and measurement What to measure Observational Studies Design of Experiments Analysis of Data and Graphical Methods Research Skills and Techniques Research Management Describing Bivariate Data Introduction to Bivariate Data Values of the Pearson Correlation Guessing Correlations Simulation Properties of Pearson's r Computing Pearson's r Restriction of Range Demo Variance Sum Law II Exercises Probability Introduction Basic Concepts Conditional Probability Demo Gamblers Fallacy Simulation Birthday Demonstration Binomial Distribution Binomial Demonstration Base Rates Bayes' Theorem Demonstration Monty Hall Problem Demonstration Exercises Normal Distributions Introduction History Areas of Normal Distributions Varieties of Normal Distribution Demo Standard Normal Normal Approximation to the Binomial Normal Approximation Demo Exercises Sampling Distributions Introduction Basic Demo Sample Size Demo Central Limit Theorem Demo Sampling Distribution of the Mean Sampling Distribution of Difference Between Means Sampling Distribution of Pearson's r Sampling Distribution of a Proportion Exercises Estimation Introduction Degrees of Freedom Characteristics of Estimators Bias and Variability Simulation Confidence Intervals Introduction Confidence Interval for the Mean t distribution Confidence Interval Simulation Confidence Interval for the Difference Between Means Confidence Interval for Pearson's Correlation Confidence Interval for a Proportion Exercises Logic of Hypothesis Testing Introduction Significance Testing Type I and Type II Errors One- and Two-Tailed Tests Interpreting Significant Results Interpreting Non-Significant Results Steps in Hypothesis Testing Signficance Testing and Confidence Intervals Misconceptions Exercises Testing Means Single Mean t Distribution Demo Difference between Two Means (Independent Groups) Robustnes Simulation All Pairwise Comparisons Among Means Specific Comparisons Difference between Two Means (Correlated Pairs) Correlated t Simulation Specific Comparisons (Correlated Observations) Pairwise Comparisons (Correlated Observations) Exercises Power Introduction Example Calculations Power Demo 1 Power Demo 2 Factors Affecting Power Exercises Prediction Introduction to Simple Linear Regression Linear Fit Demo Partitioning Sums of Squares Standard Error of the Estimate Prediction Line Demo Inferential Statistics for b and r Exercises ANOVA Introduction ANOVA Designs One-Factor ANOVA (Between-Subjects) One-Way Demo Multi-Factor ANOVA (Between-Subjects) Unequal Sample Sizes Tests Supplementing ANOVA Within-Subjects ANOVA Power of Within-Subjects Designs Demo Exercises Chi Square Chi Square Distribution One-Way Tables Testing Distributions Demo Contingency Tables 2 x 2 Table Simulation Exercises Case Studies Analysis of selected case studies
209796 Analiza Marketingowa w R 21 hours Audience: Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals. Overview: The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech. Format: Instructor-led training over the course of five half-day sessions with in-class exercises as well as homework. It can be delivered as a classroom or distance (online) course. Part 1: Inflow - acquiring new customers Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course. We look at measuring and improving campaign effectiveness. including: The importance of test and control groups. Universal control group. Techniques: Lift curves, AUC Return on investment. Optimizing marketing spend. Part 2: Base Management: managing existing customers Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include: 1. Cross-selling and up-selling: Offering the right product or service to the customer at the right time. Techniques: RFM models. Multinomial regression. b. Value of lifetime purchases. 2. Customer segmentation: Understanding the types of customers that you have. Classification models using first simple decision trees, and then random forests and other, newer techniques. Part 3: Retention: Keeping your good customers Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool) Tuning models (caret) and introduction to ensemble models. Part 4: Outflow: Understanding who are leaving and why Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include: Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer. Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.)
417029 From Data to Decision with Big Data and Predictive Analytics 21 hours Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing. It is not aimed at people configuring the solution, those people will benefit from the big picture though. Delivery Mode During the course delegates will be presented with working examples of mostly open source technologies. Short lectures will be followed by presentation and simple exercises by the participants Content and Software used All software used is updated each time the course is run so we check the newest versions possible. It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning. Quick Overview Data Sources Minding Data Recommender systems Target Marketing Datatypes Structured vs unstructured Static vs streamed Attitudinal, behavioural and demographic data Data-driven vs user-driven analytics data validity Volume, velocity and variety of data Models Building models Statistical Models Machine learning Data Classification Clustering kGroups, k-means, nearest neighbours Ant colonies, birds flocking Predictive Models Decision trees Support vector machine Naive Bayes classification Neural networks Markov Model Regression Ensemble methods ROI Benefit/Cost ratio Cost of software Cost of development Potential benefits Building Models Data Preparation (MapReduce) Data cleansing Choosing methods Developing model Testing Model Model evaluation Model deployment and integration Overview of Open Source and commercial software Selection of R-project package Python libraries Hadoop and Mahout Selected Apache projects related to Big Data and Analytics Selected commercial solution Integration with existing software and data sources
20299 Wprowadzenie do R 21 hours R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining. This course covers the manipulation of objects in R including reading data, accessing R packages, writing R functions, and making informative graphs. It includes analyzing data using common statistical models. The course teaches how to use the R software (http://www.r-project.org) both on a command line and in a graphical user interface (GUI). Introduction and preliminaries Making R more friendly, R and available GUIs The R environment Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Reading data from files The read.table()function The scan() function Accessing builtin datasets Loading data from other R packages Editing data Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical models in R Defining statistical models; formulae Contrasts Linear models Generic functions for extracting model information Analysis of variance and model comparison ANOVA tables Updating fitted models Generalized linear models Families The glm() function Nonlinear least squares and maximum likelihood models Least squares Maximum likelihood Some non-standard models Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Low-level plotting commands Mathematical annotation Hershey vector fonts Interacting with graphics Using graphics parameters Permanent changes: The par() function Temporary changes: Arguments to graphics functions Graphics parameters list Graphical elements Axes and tick marks Figure margins Multiple figure environment Device drivers PostScript diagrams for typeset documents Multiple graphics devices Dynamic graphics Packages Standard packages Contributed packages and CRAN Namespaces
110939 R dla analityków danych i naukowców 7 hours Audience managers developers scientists students Format of the course on-line instruction and discussion OR face-to-face workshops The list below gives an idea of the topics that will be covered in the workshop. The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners. A first R session Syntax for analysing one dimensional data arrays Syntax for analysing two dimensional data arrays Reading and writing data files Sub-setting data, sorting, ranking and ordering data Merging arrays Set membership The main statistical functions in R The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals) Ordinary Least Squares Regression T-tests, Analysis of Variance and Multivariable Analysis of Variance Chi-square tests for categorical variables Writing functions in R Writing software (scripts) in R Control structures (e.g. Loops) Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts) Graphical User Interfaces for R
86917 Prognozowanie w R 14 hours This course allows delegate to fully automate the process of forecasting with R Forecasting with R Introduction to Forecasting Exponential Smoothing ARIMA models The forecast package Package 'forecast' accuracy Acf arfima Arima arima.errors auto.arima bats BoxCox BoxCox.lambda croston CV dm.test dshw ets fitted.Arima forecast forecast.Arima forecast.bats forecast.ets forecast.HoltWinters forecast.lm forecast.stl forecast.StructTS gas gold logLik.ets ma meanf monthdays msts na.interp naive ndiffs nnetar plot.bats plot.ets plot.forecast rwf seasadj seasonaldummy seasonplot ses simulate.ets sindexf splinef subset.ts taylor tbats thetaf tsdisplay tslm wineind woolyrnq

Kursy ze Zniżką

Upcoming Courses

nauczanie wirtualne Statystyka, e-learning Statystyka, edukacja zdalna Statystyka, nauka przez internet Statystyka, Szkolenia Zdalne Statystyka, lekcje UML, kurs online Statystyka, Kurs Statystyka,Szkolenie Statystyka, szkolenie weekendowe Statystyka, instruktor Statystyka, kurs zdalny Statystyka, wykładowca Statystyka, szkolenie wieczorowe Statystyka, Trener Statystyka, Statystyka boot camp

Some of our clients