Szkolenia Big Data

Szkolenia Big Data

BigData to termin używany w odniesieniu do rozwiązań przeznaczonych do przechowywania i przetwarzania dużych zbiorów danych. Rozwiązania typu BigData zostały zainicjowane przez firmę Google, jakkolwiek obecnie dostępnych jest wiele rozwiązań typu open-source takich jak Apache Hadoop, Cassandra czy Cloudera Impala. Zgodnie z raportami publikowanymi przez firmę Gartner BigData jest kolejnym wielkim krokiem w branży IT, zaraz po rozwiązaniach opartych na chmurze obliczeniowej, i będzie wiodącym trendem przez kilka najbliższych lat.

Podkategorie

Plany Szkoleń Big Data

Identyfikator Nazwa Czas trwania (po 7h zegarowych dziennie) Przegląd
463779 Data Shrinkage for Government 14 hours Why shrink data Relational databases Introduction Aggregation and disaggregation Normalisation and denormalisation Null values and zeroes Joining data Complex joins Cluster analysis Applications Strengths and weaknesses Measuring distance Hierarchical clustering K-means and derivatives Applications in Government Factor analysis Concepts Exploratory factor analysis Confirmatory factor analysis Principal component analysis Correspondence analysis Software Applications in Government Predictive analytics Timelines and naming conventions Holdout samples Weights of evidence Information value Scorecard building demonstration using a spreadsheet Regression in predictive analytics Logistic regression in predictive analytics Decision Trees in predictive analytics Neural networks Measuring accuracy Applications in Government
417032 Data Mining 21 hours Course can be provided with any tools, including free open-source data mining software and applicationsIntroduction Data mining as the analysis step of the KDD process ("Knowledge Discovery in Databases") Subfield of computer science Discovering patterns in large data sets Sources of methods Artificial intelligence Machine learning Statistics Database systems What is involved? Database and data management aspects Data pre-processing Model and inference considerations Interestingness metrics Complexity considerations Post-processing of discovered structures Visualization Online updating Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Use and applications Able Danger Behavioral analytics Business analytics Cross Industry Standard Process for Data Mining Customer analytics Data mining in agriculture Data mining in meteorology Educational data mining Human genetic clustering Inference attack Java Data Mining Open-source intelligence Path analysis (computing) Police-enforced ANPR in the UK Reactive business intelligence SEMMA Stellar Wind Talx Zapaday Data dredging, data fishing, data snooping
295297 Podstawy systemów rekomendacyjnych 7 hours Szkolenie skierowane jest dla pracowników działów marketingu oraz leaderów działów IT. Problemy i nadzieje związane z gromadzeniem danych Information overload Rodzaje gromadzonych danych Potencjał danych dziś i jutro Podstawowe pojęcia związane z Data Mining Rekomendacja a wyszukiwanie Wyszukiwanie i filtrowanie Sortowanie Określanie wag wyników Wykorzystanie synonimów Wyszukiwanie pełnotekstowe Koncepcja Long Tail Idea Chrisa Andersona Argumenty przeciwników koncepcji Long Tail; Argumentacja Anity Elberse Próba określenia podobieństw Produkty Użytkownicy Dokumenty i strony internetowe Content-Based Recomendation i miary podobieństw Odległość cosinusowa Odległość euklidesowa wektórów TFIDF i pojęcie częstości występowania termów Collaborative filtering Rekomendacja na podstawie ocen społeczności Wykorzystanie grafów Możliwości grafów Określanie podobieństwa grafów Rekomendacja na podstawie relacji pomiędzy użytkownikami Sieci neuronowe Zasada działania Dane wzorcowe Przykładowe zastosowanie sieci neuronowych dla systemów rekomendacyjnych HR Zachęcanie użytkowników do udostępniania informacji Wygoda działania serwisu Ułatwienia nawigacji Funkcjonalność i UX Systemy rekomendacyjne na świecie Problemy i popularrność systemów rekomendacyjnych Udane wdrożenia systemów rekomendacyjncyh Przykłady na podstawie popularnych serwisów
464020 IoT (Internet of Things) - Technology Overview 7 hours Internet of Things (IoT) jest koncepcją połączonej sieci obiektów (fizycznych urządzeń) - pojazdów, budynków, telefonów komórkowych itp. które poprzez wbudowaną elektronikę, oprogramowanie, czujniki oraz karty sieciowe, mogą komunikować się z sobą, wymieniać i gromadzić dane.  IoT umożliwia wykrywanie i zdalne sterowanie urządzeniami za pomocą istniejącej infrastruktury sieciowej. Tworzy możliwość bardziej bezpośredniej integracji świata fizycznego z systemami komputerowymi, której wynikiem może być np. zwiększenie bezpieczeństwa, optymalizacja rozłożenia natęrzenia ruchu drogowego, inteligentne domy oraz mnóstwo innych wymiernych korzyści biznesowych. Szkolenie ma na celu przekazanie uczestnikom wiedzy o IoT, trendów rozwoju oraz pokazać jakie oraz jak gromadzić dane i do czego można je później wykorzystać. 1. Wstęp – IoT (Internet of Things) Zarys koncepcji Obszar zastosowań Trendy; smart phone, smart home, smart city, smart world? 2.  IoT Integracja z istniejącymi systemami Bezpieczeństwo Typ zasilania Gromadzenie danych – Cloud Łączność – przewodowa, bezprzewodowa WiFi 3G Bluetooth ZigBee RFID/NFC Protokoły komunikacyjne (HTTP, MQTT) Globalne korzyści Skalowalność Zagrożenia 3.  Hardware 8 bitowe – AVR, PIC ARM i pokrewne Raspberry PI/BeagleBone Arduino 4. Przykładowa implementacja czujnika temperatury Opis podzespołów Połączenie + oprogramowanie Możliwość połączenia z internetem Wysyłanie danych na serwer zdalny Analiza danych z sieci czuj
417030 Programowanie w języku F# 7 hours Szkolenie skierowane jest do programistów, analityków oraz osób chcących poznać podstawy i możliwości języka F# w oparciu o platformę .NET.Wprowadzenie Co to jest język F# i jakie możliwości daje platforma .NET Instalacja F# i IDE Korzystanie z konsoli REPL Tworzenie i uruchamianie pierwszego programu Programowanie funkcyjne Paradygmat i filary programowania obiektowego Paradygmat programowania funkcyjnego Pojęcie stanu i czasu w obydwu paradygmatach Pojęcie funkcji Typ pierwszoklasowy (obywatele pierwszej kategorii) Domknięcia, lambdy, funkcje anonimowe Rekurencja Typy danych w programowaniu funkcyjnym Podstawowe konstrukcje języka F# Wartości F# i przypisywanie im nazw Niezmienność wartości Operatory Podstawowe informacje o funkcjach Sterowanie przebiegiem programu Literały Typy danych Funkcje Argumenty, hermetyzacja i zwracanie wartości Rozwiązywanie problemów za pomocą rekurencji Korzystanie z domknięć, lambd i funkcji anonimowych Currying Elementy programowania obiektowego Klasy Właściwości Metody Kompozycja i delegacja F# w praktyce Praca ze zbiorami danych i wizualizacja Obliczenia finansowe Integracja z bibliotekami F# Testowanie
967109 Apache Solr Administration Essentials 7 hours Szkolenie jest skierowane głównie do administratorów systemów oraz specjalistów IT, którzy są zainteresowani tym, w jaki sposób funkcjonuje klaster Solr / Solr Cloud, w jaki sposób go utrzymywać i nim zarządzać. Jednodniowy trening jest skoncentrowany na operacjach administracyjnych i pod tym kątem zostały również przygotowane ćwiczenia. Nie mniej jednak warsztat zawiera wszystkie niezbędne informacje na temat Apache Solr, pozwalające zrozumieć zasady działania i przeznaczenia tej technologii. Po szkoleniu, uczestnik będzie posiadał ogólną wiedzę na temat Apache Solr, ale przede wszystkim będzie znał: podstawy indeksowania, przetwarzania i wyszukiwania dokumentów zagadnienia dot. skalowania, wydajności zasady utrzymywania instalacji Apache Solr zaawansowane zagadnienia konfiguracyjne silnika wyszukiwania Wstęp Co to jest Apache Solr Przegląd funkcjonalności Przykłady zastosowań Wstęp do Solr Cloud Przygotowanie środowiska Model danych Omówienie schema’y Przetwarzanie pól tekstowych Pola dynamiczne Tryb schemaless Indeksowanie Struktura indeksu Metody indeksowania Commitowanie zmian Operacja batchowe Wyszukiwanie Budowanie zapytań Parsery zapytań Boostowanie Filtrowanie wyników Facety Skalowanie Architektura master – slave Konfiguracja Replikacja Repeater Wiele masterów Solr Cloud Architektura Budowanie klastra Zarządzanie klastrem Transaction log ZooKeeper Routing Collection API CDCR (Cross Data Center Replication) Utrzymanie klastra Apache Solr jako usługa Logi i logowanie zdarzeń Monitoring Backup Wymagania sprzętowe Ustawienia JVM Security Zaawansowane ustawienia Configuration API Solrconfig.xml Schema factory Codec factory Directory Factory Index Segments Cache
792237 Big Data Storage Solution - NoSQL 14 hours When traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs pros and cons. This course is mostly focused on discussion and presentation of solutions, though hands-on exercises are available on demand. Limits of Traditional Technologies SQL databases Redundancy: replicas and clusters Constraints Speed Overview of database types Object Databases Document Store Cloud Databases Wide Column Store Multidimensional Databases Multivalue Databases Streaming and Time Series Databases Multimodel Databases Graph Databases Key Value XML Databases Distribute file systems Popular NoSQL Databases MongoDB Cassandra Apache Hadoop Apache Spark other solutions NewSQL Overview of available solutions Performance Inconsitencies Document Storage/Search Optimized Solr/Lucene/Elasticsearch other solutions
463718 Wprowadzenie do Neo4j - grafowej bazy danych 7 hours Wprowadzenie do Neo4j Instalacja i konfiguracja Struktura aplikacji Neo4j Relacyjne i grafowe sposoby reprezentacji danych Model grafowy danych Czy zagadnienie można i powinno reprezentować się jako graf? Wybrane przypadki użycia i modelowanie wybranego zagadnienia Najważniejsze pojęcia modelu grafowego Neo4j: Węzeł Relacja Właściwość Etykieta Język zapytań Cypher i operacje na grafach Tworzenie i zarządzanie schematem za pomocą języka Cypher Operacje CRUD na danych Zapytania Cypher oraz ich odpowiedniki w SQL Algorytmy grafowe wykorzystywane w Neo4j Interfejs REST Podstawowe zagadnienia administracyjne Tworzenie i odtwarzanie kopii zapasowych Zarządzanie bazą z poziomu przeglądarki Import i eksport danych w uniwersalnych formatach
1022881 Hadoop for Developers 14 hours Introduction What is Hadoop? What does it do? How does it do it? The Motivation for Hadoop Problems with Traditional Large-Scale Systems Introducing Hadoop Hadoopable Problems Hadoop: Basic Concepts and HDFS The Hadoop Project and Hadoop Components The Hadoop Distributed File System Introduction to MapReduce MapReduce Overview Example: WordCount Mappers Reducers Hadoop Clusters and the Hadoop Ecosystem Hadoop Cluster Overview Hadoop Jobs and Tasks Other Hadoop Ecosystem Components Writing a MapReduce Program in Java Basic MapReduce API Concepts Writing MapReduce Drivers, Mappers, and Reducers in Java Speeding Up Hadoop Development by Using Eclipse Differences Between the Old and New MapReduce APIs Writing a MapReduce Program Using Streaming Writing Mappers and Reducers with the Streaming API Unit Testing MapReduce Programs Unit Testing The JUnit and MRUnit Testing Frameworks Writing Unit Tests with MRUnit Running Unit Tests Delving Deeper into the Hadoop API Using the ToolRunner Class Setting Up and Tearing Down Mappers and Reducers Decreasing the Amount of Intermediate Data with Combiners Accessing HDFS Programmatically Using The Distributed Cache Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners Practical Development Tips and Techniques Strategies for Debugging MapReduce Code Testing MapReduce Code Locally by Using LocalJobRunner Writing and Viewing Log Files Retrieving Job Information with Counters Reusing Objects Creating Map-Only MapReduce Jobs Partitioners and Reducers How Partitioners and Reducers Work Together Determining the Optimal Number of Reducers for a Job Writing Customer Partitioners Data Input and Output Creating Custom Writable and Writable-Comparable Implementations Saving Binary Data Using SequenceFile and Avro Data Files Issues to Consider When Using File Compression Implementing Custom InputFormats and OutputFormats Common MapReduce Algorithms Sorting and Searching Large Data Sets Indexing Data Computing Term Frequency — Inverse Document Frequency Calculating Word Co-Occurrence Performing Secondary Sort Joining Data Sets in MapReduce Jobs Writing a Map-Side Join Writing a Reduce-Side Join Integrating Hadoop into the Enterprise Workflow Integrating Hadoop into an Existing Enterprise Loading Data from an RDBMS into HDFS by Using Sqoop Managing Real-Time Data Using Flume Accessing HDFS from Legacy Systems with FuseDFS and HttpFS An Introduction to Hive, Imapala, and Pig The Motivation for Hive, Impala, and Pig Hive Overview Impala Overview Pig Overview Choosing Between Hive, Impala, and Pig An Introduction to Oozie Introduction to Oozie Creating Oozie Workflows
238322 Przygotowanie do egzaminu CCAH (Certified Administrator for Apache Hadoop) 35 hours Kurs przeznaczony jest dla specjalistów z branży IT pracujących nad rozwiązaniami wymagającymi przechowywania i przetwarzania dużych zbiorów danych w systemach rozproszonych Cel szkolenia: zdobycie wiedzy na temat administracji systemem Apache Hadoop przygotowanie do egzaminu CCAH (Cloudera Certified Administrator for Apache Hadoop) 1: HDFS (38%) Funkcje poszczególnych daemonów systemu Apache Hadoop Przechowywanie i przetwarzanie danych w sytemie Hadoop W jakich okolicznościach powinniśmy wybrać system Hadoop Architektura i zasada działania HDFS Federacje HDFS HDFS High Availability Bezpieczeństwo HDFS (Kerberos) Proces odczytu i zapisu plików w HDFS 2: MapReduce (10%) Zasady działania MapReduce v1 Zasady działania MapReduce v2 (YARN) 3: Planowanie Klastra Systemu Hadoop (12%) Wybór sprzętu i systemu operacyjnego Analiza wymagań Dopasowywanie parametrów jądra i konfiguracji pamięci masowej Dopasowywanie konfiguracji sprzętowej do wymagań Skalowalność systemu: obciążenie procesora, pamięci operacyjnej, pamięci masowej (IO) oraz pojemności systemu Skalowalność na poziomie pamięci masowej: JBOD vs RAID, dyski sieciowe i wpływ wirtualizacji na wydajność systemu Topologie sieciowe: obiążenie sieci w systemie Hadoop (HDFS i MapReduce) i optymalizacja połączeń 4: Instalacja i Administracja Klastrem Systemu Hadoop (17%) Wpływ awarii na działanie klastra Monitorowanie logów Podstawowe metryki wykorzystywane przez klaster systemu Hadoop Narzędzia do monitorowania klastra systemu Hadoop Narzędzia do administracji klastrem systemu Hadoop 5: Zarządzanie Zasobami (6%) Architektura i funkcje kolejek Alokacja zasobów przez kolejki FIFO Alokacja zasobów przez kolejki sprawiedliwe Alokacja zasobów przez kolejki pojemnościowe 6: Monitorowanie i Logowanie (12%) Monitorowanie metryk Zarządzanie NameNodem i JobTrackerem z poziomu Web GUI Konfiguracja log4j Jak monitorować daemony systemu Hadoop Monitorowanie zurzycia CPU na kluczowych serwerach w klastrze Monitorowanie zurzycia pamięci RAM i swap Zarządzanie i przeglądanie logów Interpretacja logów 7: Środowisko Systemu Hadoop (5%) Narzędzia pomocnicze
806643 Spark for Developers 21 hours OBJECTIVE: This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX. AUDIENCE : Developers / Data Analysts Scala primer A quick introduction to Scala Labs : Getting know Scala Spark Basics Background and history Spark and Hadoop Spark concepts and architecture Spark eco system (core, spark sql, mlib, streaming) Labs : Installing and running Spark First Look at Spark Running Spark in local mode Spark web UI Spark shell Analyzing dataset – part 1 Inspecting RDDs Labs: Spark shell exploration RDDs RDDs concepts Partitions RDD Operations / transformations RDD types Key-Value pair RDDs MapReduce on RDD Caching and persistence Labs : creating & inspecting RDDs;   Caching RDDs Spark API programming Introduction to Spark API / RDD API Submitting the first program to Spark Debugging / logging Configuration properties Labs : Programming in Spark API, Submitting jobs Spark SQL SQL support in Spark Dataframes Defining tables and importing datasets Querying data frames using SQL Storage formats : JSON / Parquet Labs : Creating and querying data frames; evaluating data formats Mlib mlib intro mlib algorithms Labs : Writing mlib applications GraphX GraphX library overview GraphX APIs Labs : Processing graph data using Spark Spark Streaming Streaming overview Evaluating Streaming platforms Streaming operations Sliding window operations Labs : Writing spark streaming applications Spark and Hadoop Hadoop Intro (HDFS / YARN) Hadoop + Spark architecture Running Spark on Hadoop YARN Processing HDFS files using Spark Spark Performance and Tuning Broadcast variables Accumulators Memory management & caching Spark Operations Deploying Spark in production Sample deployment templates Configurations Monitoring Troubleshooting
809300 Big Data Hadoop Analyst Training 28 hours Big Data Analyst Training to praktyczny kurs, który polecany jest każdemu, kto chce w przyszłości zostać ekspertem Data Scientist. Kurs skupia sie na aspektach potrzebnych do pracy nowoczesnego analityka w technologii Big Data. W trakcie kursu prezentowane są narzędzia pozwalające na uzyskanie dostępu, zmianę, transformację i analizę skomplikowanych struktur danych umieszczonych w klastrze Hadoop. W trakcie kursu będą poruszane tematy w ramach technologii Hadoop Ecosystem (Pig, Hive, Impala, ELK i inne). Funkcjonaloność narzędzi Pig, Hive, Impala, ELK, pozwalające na zbieranie danych, zapisywanie wyników i analitykę. Jak Pig, Hive i Impala mogą podnieść wydajność typowych i codziennych zadań analitycznych. Wykonywanie w czasie rzeczywistym interaktywnych analiz ogromnych zbiorów danych aby uzyskać cenne i wartościowe elementy dla biznesu oraz jak interpretować wnioski. Wykonywanie złożonych zapytań na bardzo dużych wolumenach danych. Podstawy Hadoop. Wprowadzenie do Pig. Podstawowa analiza danych z wykorzystaniem narzędzia Pig. Procesowanie złożonych danych z Pig. Operacje na wielu zbiorach danych z wykorzytaniem Pig. Rozwiązywanie problemów i optymalizacja Pig. Wprowadzenie do Hive, Impala, ELK. Wykonywanie zapytań w Hive, Impala, ELK. Zarządzanie danymi w Hive. Przechowywanie danych i wydajność. Analizy z wykorzystaniem narzędzi Hive i Impala. Praca z narzędziem Impala i ELK. Analiza tekstu i złożonych typów danych. Optymalizacja Hive, Pig, Impala, ELK. Interoperacyjność i przepływ pracy. Pytania, zadania, certyfikacja.
463964 MATLAB Fundamental 21 hours This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include: Working with the MATLAB user interface Entering commands and creating variables Analyzing vectors and matrices Visualizing vector and matrix data Working with data files Working with data types Automating commands with scripts Writing programs with logic and flow control Writing functions Part 1 A Brief Introduction to MATLAB Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you An Example: C vs. MATLAB MATLAB Product Overview MATLAB Application Fields What MATLAB can do for you? The Course Outline Working with the MATLAB User Interface Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes. MATALB Interface Reading data from file Saving and loading variables Plotting data Customizing plots Calculating statistics and best-fit line Exporting graphics for use in other applications Va​riables and Expressions Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables. Entering commands Creating variables Getting help Accessing and modifying values in variables Creating character variables Analysis and Visualization with Vectors Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command. Calculations with vectors Plotting vectors Basic plot options Annotating plots Analysis and Visualization with Matrices Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications. Size and dimensionality Calculations with matrices Statistics with matrix data Plotting multiple columns Reshaping and linear indexing Multidimensional arrays Part 2 Automating Commands with Scripts Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical. A Modelling Example The Command History Creating script files Running scripts Comments and Code Cells Publishing scripts Working with Data Files Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats. Importing data Mixed data types Cell arrays Conversions amongst numerals, strings, and cells Exporting data Multiple Vector Plots Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data. Graphics structure Multiple figures, axes, and plots Plotting equations Using color Customizing plots Logic and Flow Control Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user. Logical operations and variables Logical indexing Programming constructs Flow control Loops Matrix and Image Visualization Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images. Scattered Interpolation using vector and matrix data 3-D matrix visualization 2-D matrix visualization Indexed images and colormaps True color images Part 3 Data Analysis Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command. Dealing with missing data Correlation Smoothing Spectral analysis and FFTs Solving linear systems of equations Writing Functions Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables. Why functions? Creating functions Adding comments Calling subfunctions Workspaces  Subfunctions Path and precedence Data Types Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized. MATLAB data types Integers Structures Converting types File I/O Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files. Opening and closing files Reading and writing text files Reading and writing binary files Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Conclusion Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification. Objectives: Summarise what we have learnt A summary of the course Other upcoming courses on MATLAB Note that the course might be subject to few minor discrepancies when being delivered without prior notifications.
806636 Hadoop for Developers (4 days) 28 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.   Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software Lab : first look at Hadoop Section 2: HDFS Design and architecture concepts (horizontal scaling, replication, data locality, rack awareness) Daemons : Namenode, Secondary namenode, Data node communications / heart-beats data integrity read / write path Namenode High Availability (HA), Federation labs : Interacting with HDFS Section 3 : Map Reduce concepts and architecture daemons (MRV1) : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Map Reduce Version 1 and Version 2 (YARN) Internals of Map Reduce Introduction to Java Map Reduce program labs : Running a sample MapReduce program Section 4 : Pig pig vs java map reduce pig job flow pig latin language ETL with Pig Transformations & Joins User defined functions (UDF) labs : writing Pig scripts to analyze data Section 5: Hive architecture and design data types SQL support in Hive Creating Hive tables and querying partitions joins text processing labs : various labs on processing data with Hive Section 6: HBase concepts and architecture hbase vs RDBMS vs cassandra HBase Java API Time series data on HBase schema design labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise
209768 Big Data Business Intelligence for Govt. Agencies 40 hours Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information. High-value government solutions will be created from a mashup of the most disruptive technologies: Mobile devices and applications Cloud services Social business technologies and networking Big Data and analytics IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured. But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of these volumes of Big Data requires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog. The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it. The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge. Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billio while also fulfilling mission objectives.). Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers. Breakdown of topics on daily basis: (Each session is 2 hours) Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Govt. Case Studies from NIH, DoE Big Data adaptation rate in Govt. Agencies & and how they are aligning their future operation around Big Data Predictive Analytics Broad Scale Application Area in DoD, NSA, IRS, USDA etc. Interfacing Big Data with Legacy data Basic understanding of enabling technologies in predictive analytics Data Integration & Dashboard visualization Fraud management Business Rule/ Fraud detection generation Threat detection and profiling Cost benefit analysis for Big Data implementation Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when? Hadoop vs. Other NoSQL solutions For interactive, random access to data Hbase (column oriented database) on top of Hadoop Random access to data but restrictions imposed (max 1 PB) Not good for ad-hoc analytics, good for logging, counting, time-series Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access) Flume – Stream data (e.g. log data) into HDFS Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine KNN p-Tree Algebra & vertical mining Neural Network Big Data large variable problem -Random forest (RF) Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M Text analytic tool-Treeminer Agile learning Agent based learning Distributed learning Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Govt. Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Threat/fraudstar/vendor profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Real Time and Scalable Analytic Over Hadoop Why common analytic algorithms fail in Hadoop/HDFS Apache Hama- for Bulk Synchronous distributed computing Apache SPARK- for cluster computing for real time analytic CMU Graphics Lab2- Graph based asynchronous approach to distributed computing KNN p-Algebra based approach from Treeminer for reduced hardware cost of operation Day-3: Session-2: Tools for eDiscovery and Forensics eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance Predictive coding and technology assisted review (TAR) Live demo of a Tar product ( vMiner) to understand how TAR works for faster discovery Faster indexing through HDFS –velocity of data NLP or Natural Language processing –various techniques and open source products eDiscovery in foreign languages-technology for foreign language processing Day-3 : Session 3: Big Data BI for Cyber Security –Understanding whole 360 degree views of speedy data collection to threat identification Understanding basics of security analytics-attack surface, security misconfiguration, host defenses Network infrastructure/ Large datapipe / Response ETL for real time analytic Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data Day-3: Session 4: Big Data in USDA : Application in Agriculture Introduction to IoT ( Internet of Things) for agriculture-sensor based Big Data and control Introduction to Satellite imaging and its application in agriculture Integrating sensor and image data for fertility of soil, cultivation recommendation and forecasting Agriculture insurance and Big Data Crop Loss forecasting Day-4 : Session-1: Fraud prevention BI from Big Data in Govt-Fraud analytic: Basic classification of Fraud analytics- rule based vs predictive analytics Supervised vs unsupervised Machine learning for Fraud pattern detection Vendor fraud/over charging for projects Medicare and Medicaid fraud- fraud detection techniques for claim processing Travel reimbursement frauds IRS refund frauds Case studies and live demo will be given wherever data is available. Day-4 : Session-2: Social Media Analytic- Intelligence gathering and analysis Big Data ETL API for extracting social media data Text, image, meta data and video Sentiment analysis from social media feed Contextual and non-contextual filtering of social media feed Social Media Dashboard to integrate diverse social media Automated profiling of social media profile Live demo of each analytic will be given through Treeminer Tool. Day-4 : Session-3: Big Data Analytic in image processing and video feeds Image Storage techniques in Big Data- Storage solution for data exceeding petabytes LTFS and LTO GPFS-LTFS ( Layered storage solution for Big image data) Fundamental of image analytics Object recognition Image segmentation Motion tracking 3-D image reconstruction Day-4: Session-4: Big Data applications in NIH: Emerging areas of Bio-informatics Meta-genomics and Big Data mining issues Big Data Predictive analytic for Pharmacogenomics, Metabolomics and Proteomics Big Data in downstream Genomics process Application of Big data predictive analytics in Public health Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based services in Govt. Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from saving the licensed database cost Revenue gain from location based services Saving from fraud prevention An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies Day-5: Session 4: Review of Big Data Vendors and review of their products. Q/A session: Accenture APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems Treeminer VMware (Part of EMC) 
417091 Semantic Web Overview 7 hours The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Semantic Web Overview Introduction Purpose Standards Ontology Projects Resource Description Framework (RDF) Introduction Motivation and Goals RDF Concepts RDF Vocabulary URI and Namespace (Normative) Datatypes (Normative) Abstract Syntax (Normative) Fragment Identifiers
806635 Advanced Hadoop for Developers 21 hours Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase.  These advanced programming techniques will be beneficial to experienced Hadoop developers. Audience: developers Duration: three days Format: lectures (50%) and hands-on labs (50%).   Section 1: Data Management in HDFS   Various Data Formats (JSON / Avro / Parquet) Compression Schemes Data Masking Labs : Analyzing different data formats;  enabling compression Section 2: Advanced Pig   User-defined Functions Introduction to Pig Libraries (ElephantBird / Data-Fu) Loading Complex Structured Data using Pig Pig Tuning Labs : advanced pig scripting, parsing complex data types Section 3 : Advanced Hive   User-defined Functions Compressed Tables Hive Performance Tuning Labs : creating compressed tables, evaluating table formats and configuration Section 4 : Advanced HBase   Advanced Schema Modelling Compression Bulk Data Ingest Wide-table / Tall-table comparison HBase and Pig HBase and Hive HBase Performance Tuning Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
118127 Model MapReduce w implementacji oprogramowania Apache Hadoop 14 hours Szkolenie skierowane jest do organizacji chcących wdrożyć rozwiązania pozwalające na przetwarzanie dużych zbiorów danych za pomocą klastrów. Data Mining i Bussiness Intelligence Wprowadzenie Obszary zastosowań Możliwości Podstawy eksploracji danych i odkrywania wiedzy Big data Co rozumiemy pod pojęciem Big data? Big data a Data mining MapReduce Opis modelu Przykładowe zastosowanie Statystyki Model klastra Hadoop Czym jest Hadoop Instalacja Podstawowa konfiguracja Ustawienia klastra Architektura i konfiguracja Hadoop Distributed File System Komendy i obsługa z konsoli Narzędzie DistCp MapReduce i Hadoop Streaming Administracja i konfiguracja Hadoop On Demand Alternatywne rozwiązania
806641 Hadoop For Administrators 21 hours Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos. “…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized” — Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising Audience Hadoop administrators Format Lectures and hands-on labs, approximate balance 60% lectures, 40% labs. Prerequisites Introduction Hadoop history, concepts Ecosystem Distributions High level architecture Hadoop myths Hadoop challenges (hardware / software) Labs: discuss your Big Data projects and problems Planning and installation Selecting software, Hadoop distributions Sizing the cluster, planning for growth Selecting hardware and network Rack topology Installation Multi-tenancy Directory structure, logs Benchmarking Labs: cluster install, run performance benchmarks HDFS operations Concepts (horizontal scaling, replication, data locality, rack awareness) Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) Health monitoring Command-line and browser-based administration Adding storage, replacing defective drives Labs: getting familiar with HDFS command lines Data ingestion Flume for logs and other data ingestion into HDFS Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL Hadoop data warehousing with Hive Copying data between clusters (distcp) Using S3 as complementary to HDFS Data ingestion best practices and architectures Labs: setting up and using Flume, the same for Sqoop MapReduce operations and administration Parallel computing before mapreduce: compare HPC vs Hadoop administration MapReduce cluster loads Nodes and Daemons (JobTracker, TaskTracker) MapReduce UI walk through Mapreduce configuration Job config Optimizing MapReduce Fool-proofing MR: what to tell your programmers Labs: running MapReduce examples YARN: new architecture and new capabilities YARN design goals and implementation architecture New actors: ResourceManager, NodeManager, Application Master Installing YARN Job scheduling under YARN Labs: investigate job scheduling Advanced topics Hardware monitoring Cluster monitoring Adding and removing servers, upgrading Hadoop Backup, recovery and business continuity planning Oozie job workflows Hadoop high availability (HA) Hadoop Federation Securing your cluster with Kerberos Labs: set up monitoring Optional tracks Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5) Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
209771 IoT (Internet of Things) for Entrepreneurs, Managers and Investors 21 hours Estimates for Internet of Things or IoT market value are massive, since by definition the IoT is an integrated and diffused layer of devices, sensors, and computing power that overlays entire consumer, business-to-business, and government industries. The IoT will account for an increasingly huge number of connections: 1.9 billion devices today, and 9 billion by 2018. That year, it will be roughly equal to the number of smartphones, smart TVs, tablets, wearable computers, and PCs combined.  In the consumer space, many products and services have already crossed over into the IoT, including kitchen and home appliances, parking, RFID, lighting and heating products, and a number of applications in Industrial Internet.  However the underlying technologies of IoT are nothing new as M2M communication existed since the birth of Internet. However what changed in last couple of years is the emergence of number of inexpensive wireless technologies added by overwhelming adaptation of smart phones and Tablet in every home. Explosive growth of mobile devices led to present demand of IoT.  Due to unbounded opportunities in IoT business, a large number of small and medium sized entrepreneurs jumped into bandwagon of IoT gold rush. Also due to emergence of open source electronics and IoT platform, cost of development of IoT system and further managing its sizeable production is increasingly affordable. Existing electronic product owners are experiencing pressure to integrate their device with Internet or Mobile app.  This training is intended for a technology and business review of an emerging industry so that IoT enthusiasts/entrepreneurs can grasp the basics of IoT technology and business. Course objectives  Main objective of the course is to introduce emerging technological options, platforms and case studies of IoT implementation in home & city automation (smart homes and cities), Industrial Internet, healthcare, Govt., Mobile Cellular and other areas.  Basic introduction of all the elements of IoT-Mechanical, Electronics/sensor platform, Wireless and wireline protocols, Mobile to Electronics integration, Mobile to enterprise integration, Data-analytics and Total control plane.  M2M Wireless protocols for IoT- WiFi, Zigbee/Zwave, Bluetooth, ANT+ : When and where to use which one?  Mobile/Desktop/Web app- for registration, data acquisition and control –Available M2M data acquisition platform for IoT-–Xively, Omega and NovoTech, etc. Security issues and security solutions for IoT Open source/commercial electronics platform for IoT-Rasberry Pi, Adruino , ArmMbedLPC etc  Open source /commercial enterprise cloud platform for IoT-Ayla, iO Bridge, Libellium, Axeda, Cisco frog cloud Studies of business and technology of some of the common IoT devices like Home automation, Smoke alarm, vehicles, military, home health etc. Target Audience  Investors and IoT entrepreneurs  Managers and Engineers whose company is venturing into IoT space  Business Analysts & Investors 1. Day-1: Session -1: Business Overview of Why IoT is so important  Case Studies from Nest, CISCO and top industries  IoT adaptation rate in North American & and how they are aligning their future business model and operation around IoT  Broad Scale Application Area  Smart house and smart city  Industrial Internet  Smart Cars  Home healthcare  Business Rule generation for IoT  3 layered architecture of Big Data –Physical (Sensors), Communication and Data Intelligence 2. Day-1: Session-2 : Introduction of IoT: All about Sensors  Basic function and architecture of a sensor –Sensor body, sensor mechanism, sensor calibration, sensor maintenance, cost and pricing structure, legacy and modern sensor network- All basics about the sensors  Development of sensor electronics- IoT vs legacy and open source vs traditional PCB design style Development of Sensor communication protocols –history to modern days. Legacy protocols like Modbus, relay, HART to modern day Zigbee, Zwave, X10,Bluetooth, ANT etc..  Business driver for sensor deployment- FDA/EPA regulation, Fraud/tempering detection, supervision, Quality control and process management  Different Kind of Calibration Techniques-manual, automation, infield, primary and secondary calibration –their implication in IoT Powering options for sensors-Battery, solar, Witricity. Mobile and PoE 3. Day-1 : Session -3 : Introduction to Sensor Network and Wireless protocol  What is a sensor network?  Wireless vs. Wireline network  WiFi- 802.11 families: N to S- application of each standards and common vendors.  Zigbee and Zwave-advantage of low power mesh networking. Long distance Zigbee. Introduction to different Zigbee chips:  Bluetooth/BLE: Low power vs high power, speed of detection, class of BLE. Introduction of Bluetooth vendors & their review :  X10, ANT+  Other long distance RF communication link  LOS vs NLOS links  Capacity and throughput calculation  Application issues in wireless protocols- power consumption, reliability, PER, QoS, LOS 4. Day-1 : Session-4 : Review of Electronics Platform, production and cost projection  PCB vs FPGA vs ASIC design-how to take decision  Prototyping electronics vs Production electronics  QA certificate for IoT- CE/CSA/UL/IEC/RoHS/IP65: What are those and  Basic introduction of multi-layer PCB design and its workflow  Electronics reliability-basic concept of FIT and early mortality rate  Environmental and reliability testing-basic concepts  Basic Open source platforms: Adruino, Rasberry Pi, Beaglebone, when needed?  RedBack, Diamond Back 5. Day-2: Session-1: Conceiving a new IoT product- Product requirement document for IoT  State of the present art and review of existing technology in the market place Suggestion for new features and technologies based on market analysis and patent issues  Detailed technical specs for new products- System, software, hardware, mechanical, installation etc.  Packaging and documentation requirements  Servicing and customer support requirements  High level design (HLD) for understanding of product concept  Release plan for phase wise introduction of the new features  Skill set for the development team and proposed project plan -cost & duration  Target manufacturing price 6. Day-2: Session-2: Introduction to Mobile app platform for IoT  Protocol stack of Mobile app for IoT  Mobile to server integration –what are the factors to look out  What are the intelligent layer that can be introduced at Mobile app level ? iBeacon in IoS  Window Azure  Linkafy Mobile platform for IoT  Axeda  Xively 7. Day-2: Session-3: Machine learning for intelligent IoT  Introduction to Machine learning  Learning classification techniques  Bayesian Prediction-preparing training file  Support Vector Machine  Image and video analytic for IoT  Fraud and alert analytic through IoT  Bio –metric ID integration with IoT  Real Time Analytic/Stream Analytic  Scalability issues of IoT and machine learning  What are the architectural implementation of Machine learning for IoT   8. Day-2: Session-4 Analytic Engine for IoT  Insight analytic  Visualization analytic  Structured predictive analytic Unstructured predictive analytic  Recommendation Engine  Pattern detection  Rule/Scenario discovery –failure, fraud, optimization  Root cause discovery  9. Day-3 : Sesion-1 : Security in IoT implementation  Why security is absolutely essential for IoT  Mechanism of security breach in IOT layer  Privacy enhancing technologies Fundamental of network security  Encryption and cryptography implementation for IoT data  Security standard for available platform  European legislation for security in IoT platform  Secure booting  Device authentication  Firewalling and IPS  Updates and patches  10. Day-3 : Sesion-2 : Database implementation for IoT : Cloud based IoT platforms  SQL vs NoSQL-Which one is good for your IoT application  Open sourced vs. Licensed Database  Available M2M cloud platform Axeda  Xively  Omega  NovoTech  Ayla  Libellium  CISCO M2M platform  AT &T M2M platform  Google M2M platform  11. Day-3, Session-3: A few common IoT systems  Home automation  Energy optimization in Home  Automotive-OBD  IoT-Lock  Smart Smoke alarm BAC ( Blood alcohol monitoring ) for drug abusers under probation  Pet cam for Pet lovers  Wearable IOT  Mobile parking ticketing system Indoor location tracking in Retail store  Home health care  Smart Sports Watch  12. Day-3: Session-4: Big Data for IoT  4V- Volume, velocity, variety and veracity of Big Data  Why Big Data is important in IoT  Big Data vs legacy data in IoT  Hadoop for IoT-when and why?  Storage technique for image, Geospatial and video data Distributed database  Parallel computing basics for IoT
806639 Hadoop for Business Analysts 21 hours Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics Audience Business Analysts Duration three days Format Lectures and hands on labs. Section 1: Introduction to Hadoop hadoop history, concepts eco system distributions high level architecture hadoop myths hadoop challenges hardware / software Labs : first look at Hadoop Section 2: HDFS Overview concepts (horizontal scaling, replication, data locality, rack awareness) architecture (Namenode, Secondary namenode, Data node) data integrity future of HDFS : Namenode HA, Federation labs : Interacting with HDFS Section 3 : Map Reduce Overview mapreduce concepts daemons : jobtracker / tasktracker phases : driver, mapper, shuffle/sort, reducer Thinking in map reduce Future of mapreduce (yarn) labs : Running a Map Reduce program Section 4 : Pig pig vs java map reduce pig latin language user defined functions understanding pig job flow basic data analysis with Pig complex data analysis with Pig multi datasets with Pig advanced concepts lab : writing pig scripts to analyze / transform data Section 5: Hive hive concepts architecture SQL support in Hive data types table creation and queries Hive data management partitions & joins text analytics labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions Section 6: BI Tools for Hadoop BI tools and Hadoop Overview of current BI tools landscape Choosing the best tool for the job
209766 Big Data Business Intelligence for Telecom & Communication Service Providers 35 hours Overview Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month. Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored. With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.) This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain. Course objectives Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following: Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective How Big Data analytic differs from legacy data analytic In-house justification of Big Data -Telco perspective Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies Network failure and service failure analytics from Network meta-data and IPDR Financial analysis-fraud, wastage and ROI estimation from sales and operational data Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization Target Audience Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office. Business Analysts in Telco CFO office managers/analysts Operational managers QA managers Breakdown of topics on daily basis: (Each session is 2 hours) Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Telco. Case Studies from T-Mobile, Verizon etc. Big Data adaptation rate in North American Telco & and how they are aligning their future business model and operation around Big Data BI Broad Scale Application Area Network and Service management Customer Churn Management Data Integration & Dashboard visualization Fraud management Business Rule generation Customer profiling Localized Ad pushing Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when? Hadoop vs. Other NoSQL solutions For interactive, random access to data Hbase (column oriented database) on top of Hadoop Random access to data but restrictions imposed (max 1 PB) Not good for ad-hoc analytics, good for logging, counting, time-series Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access) Flume – Stream data (e.g. log data) into HDFS Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine Neural Network Big Data large variable problem -Random forest (RF) Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M Agile learning Agent based learning- Example from Telco operation Distributed learning –Example from Telco operation Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Telecom Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Customer profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Network Operation analytic- root cause analysis of network failures, service interruption from meta data, IPDR and CRM: CPU Usage Memory Usage QoS Queue Usage Device Temperature Interface Error IoS versions Routing Events Latency variations Syslog analytics Packet Loss Performance Threshold Device Traps IPDR ( IP detailed record) collection and processing Use of IPDR data for Subscriber Bandwidth consumption, Network interface utilization, modem status and diagnostic HFC information Day-3: Session-2: Tools for Network service failure analysis: Network Summary Dashboard: monitor overall network deployments and track your organization's key performance indicators Peak Period Analysis Dashboard: understand the application and subscriber trends driving peak utilization, with location-specific granularity Routing Efficiency Dashboard: control network costs and build business cases for capital projects with a complete understanding of interconnect and transit relationships Real-Time Entertainment Dashboard: access metrics that matter, including video views, duration, and video quality of experience (QoE) IPv6 Transition Dashboard: investigate the ongoing adoption of IPv6 on your network and gain insight into the applications and devices driving trends Case-Study-1: The Alcatel-Lucent Big Network Analytics (BNA) Data Miner Multi-dimensional mobile intelligence (m.IQ6) Day-3 : Session 3: Big Data BI for Marketing/Sales –Understanding sales/marketing from Sales data: ( All of them will be shown with a live predictive analytic demo ) To identify highest velocity clients To identify clients for a given products To identify right set of products for a client ( Recommendation Engine) Market segmentation technique Cross-Sale and upsale technique Client segmentation technique Sales revenue forecasting technique Day-3: Session 4: BI needed for Telco CFO office: Overview of Business Analytics works needed in a CFO office Risk analysis on new investment Revenue, profit forecasting New client acquisition forecasting Loss forecasting Fraud analytic on finances ( details next session ) Day-4 : Session-1: Fraud prevention BI from Big Data in Telco-Fraud analytic: Bandwidth leakage / Bandwidth fraud Vendor fraud/over charging for projects Customer refund/claims frauds Travel reimbursement frauds Day-4 : Session-2: From Churning Prediction to Churn Prevention : 3 Types of Churn : Active/Deliberate , Rotational/Incidental, Passive Involuntary 3 classification of churned customers: Total, Hidden, Partial Understanding CRM variables for churn Customer behavior data collection Customer perception data collection Customer demographics data collection Cleaning CRM Data Unstructured CRM data ( customer call, tickets, emails) and their conversion to structured data for Churn analysis Social Media CRM-new way to extract customer satisfaction index Case Study-1 : T-Mobile USA: Churn Reduction by 50% Day-4 : Session-3: How to use predictive analysis for root cause analysis of customer dis-satisfaction : Case Study -1 : Linking dissatisfaction to issues – Accounting, Engineering failures like service interruption, poor bandwidth service Case Study-2: Big Data QA dashboard to track customer satisfaction index from various parameters such as call escalations, criticality of issues, pending service interruption events etc. Day-4: Session-4: Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based Advertisement Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from customer churn Revenue gain from location based and other targeted Ad An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies in 2 Telco Day-5: Session 3 & 4: Review of Big Data Vendors and review of their products. Q/A session: AccentureAlcatel-Lucent Amazon –A9 APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks Huawei HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems VMware (Part of EMC) 
806640 Data Analytics With R 21 hours R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students.  It covers language fundamentals, libraries and advanced concepts.  Advanced data analytics and graphing with real world data. Audience Developers / data analytics Duration 3 days Format Lectures and Hands-on Day One: Language Basics Course Introduction About Data Science Data Science Definition Process of Doing Data Science. Introducing R Language Variables and Types Control Structures (Loops / Conditionals) R Scalars, Vectors, and Matrices Defining R Vectors Matricies String and Text Manipulation Character data type File IO Lists Functions Introducing Functions Closures lapply/sapply functions DataFrames Labs for all sections Day Two: Intermediate R Programming DataFrames and File I/O Reading data from files Data Preparation Built-in Datasets Visualization Graphics Package plot() / barplot() / hist() / boxplot() / scatter plot Heat Map ggplot2 package ( qplot(), ggplot()) Exploration With Dplyr Labs for all sections Day 3: Advanced Programming With R Statistical Modeling With R Statistical Functions Dealing With NA Distributions (Binomial, Poisson, Normal) Regression Introducing Linear Regressions Recommendations Text Processing (tm package / Wordclouds) Clustering Introduction to Clustering KMeans Classification Introduction to Classification Naive Bayes Decision Trees Training using caret package Evaluating Algorithms R and Big Data Hadoop Big Data Ecosystem RHadoop Labs for all sections
226127 Sieci Neuronowe w R 14 hours Szkolenie jest wprowadzeniem do wdrożenia sieci neuronowych w życiu codziennym wykorzystując oprogramowanie R-project. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed
806642 Solr for Developers 21 hours This course introduces students to the Solr platform. Through a combination of lecture, discussion and labs students will gain hands on experience configuring effective search and indexing. The class begins with basic Solr installation and configuration then teaches the attendees the search features of Solr. Students will gain experience with faceting, indexing and search relevance among other features central to the Solr platform. The course wraps up with a number of advanced topics including spell checking, suggestions, Multicore and SolrCloud. Duration: 3 days Audience: Developers, business users, administrators Overall Goal Provide experienced web developers and technical staff with a comprehensive introduction to the Solr search platform. Teach software developer deep skills creating search solutions. I. Fundamentals Solr Overview Installing and running Solr Adding content to Solr Reading a Solr XML response Changing parameters in the URL Using the browse interface Labs: install Solr, run queries II. Searching Sorting results Query parsers More queries Hardwiring request parameters Adding fields to default search Faceting Result grouping Labs: advanced queries, experiment with faceted search III. Indexing Adding your own content to Solr Deleting data from solr Building a bookstore search Adding book data Exploring the book data Dedupe update processor Labs: indexing various document collections IV. Schema Updating Adding fields to the schema Analyzing text Labs: customize Solr schema V. Relevance Field weighting Phrase queries Function queries Fuzzier search Sounds-like Labs: implementing queries for  relevance VI. Extended features More-like-this Geospatial Spell checking Suggestions Highlighting Pseudo-fields Pseudo-joins Multilanguage Labs: implementing spell checking and suggestions VII. Multicore Adding more kinds of data Labs: creating and administering cores VIII. SolrCloud Introduction How SolrCloud works Commit strategies ZooKeeper Managing Solr config files Labs: administer SolrCloud IX. Developing with Solr API Talking to Solr through REST Configuration Indexing and searching Solr and Spring Labs: code to read and write Solr index, exercise in Spring with Solr X. Developing with Lucene API Building a Lucene index Searching, viewing, debugging Extracting text with Tika Scaling Lucene indices on clusters Lucene performance tuning Labs: coding with Lucene XI. Conclusion Other approaches to search ElasticSearch DataStax Enterprise: Solr+Cassandra Cloudera Solr integration Blur Future directions
238323 Administrator Training for Apache Hadoop 35 hours Głównym celem szkolenia jest zdobycie wiedzy z administracji systemem Apache Hadoop w środowiskach MapReduce oraz YARN na poziomie zaawansowanym. Tematyka szkolenia dotyczy w głównej mierze architektury systemu Hadoop, a w szczególności systemu plików HDFS oraz modeli programistycznych MapReduce i YARN oraz zagadnień związanych z planowaniem, instalacją, konfiguracją, administracją, zarządzaniem i monitorowaniem klastra systemu Hadoop. Pozostałe zagadnienia związane z tematyką BigData takie jak HBase, Cassandra, Impala, Pig, Hiver oraz Sqoop są również omówione, choć pobieżnie. Kurs przeznaczony jest w głównej mierze do specjalistów z branży IT, którzy chcą przygotować się i zdać egzamin CCAH (Cloudera Certified administrator for Apache Hadoop). 1: HDFS (17%) Funkcje poszczególnych daemonów systemu Apache Hadoop Przechowywanie i przetwarzanie danych w sytemie Hadoop W jakich okolicznościach powinniśmy wybrać system Hadoop Architektura i zasada działania HDFS Federacje HDFS HDFS High Availability Bezpieczeństwo HDFS (Kerberos) Studiowanie przypadków Proces odczytu i zapisu plików w HDFS Interfejsk tekstowy HDFS 2: YARN i MapReduce w wersji 2 (MRv2) (17%): Konfiguracja YARN Wdrażanie YARN Architektura i zasada działania YARN Alokacja zasobów w YARN Przebieg wykonania zadań w YARN Migracja z MRv1 do YARN 3: Planowanie Klastra Systemu Hadoop (16%) Analiza wymagań i wybór sprzętu Analiza wymagań i wybór systemu operacyjnego Dobór parametrów jądra i konfiguracji pamięci masowej Dobór konfiguracji sprzętowej do wymagań Dobór podzespołów klastra i narzędzi pomocniczych Skalowalność systemu: obciążenie procesora, pamięci operacyjnej, pamięci masowej (IO) oraz pojemności systemu Skalowalność na poziomie pamięci masowej: JBOD vs RAID, dyski sieciowe i wpływ wirtualizacji na wydajność systemu Topologie sieciowe: obiążenie sieci w systemie Hadoop (HDFS i MapReduce) i optymalizacja połączeń 4: Instalacja i Administracja Klastrem Systemu Hadoop (25%) Wpływ awari na działanie klastra Monitorowanie logów Podstawowe metryki wykorzystywane przez klaster systemu Hadoop Narzędzia do monitorowania klastra systemu Hadoop Narzędzia pomocnicze: Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, Pig i inne Narzędzia do administracji klastrem systemu Hadoop 5: Zarządzanie Zasobami (10%) Architektura i funkcje kolejek Alokacja zasobów przez kolejki FIFO Alokacja zasobów przez kolejki sprawiedliwe Alokacja zasobów przez kolejki pojemnościowe 6: Monitorowanie i Logowanie (15%) Monitorowanie metryk Zarządzanie NameNodem i JobTrackerem z poziomu Web GUI Jak monitorować daemony systemu Hadoop Monitorowanie zużycia CPU na kluczowych serwerach w klastrze Monitorowanie zużycia pamięci RAM i swap Zarządzanie i przeglądanie logów Interpretacja logów
806644 HBase for Developers 21 hours This course introduces HBase – a NoSQL store on top of Hadoop.  The course is intended for developers who will be using HBase to develop applications,  and administrators who will manage HBase clusters. We will walk a developer through HBase architecture and data modelling and application development on HBase. It will also discuss using MapReduce with HBase, and some administration topics, related to performance optimization. The course  is very  hands-on with lots of lab exercises. Duration : 3 days Audience : Developers  & Administrators Section 1: Introduction to Big Data & NoSQL Big Data ecosystem NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage HBase and NoSQL Section 2 : HBase Intro Concepts and Design Architecture (HMaster and Region Server) Data integrity HBase ecosystem Lab : Exploring HBase Section 3 : HBase Data model Namespaces, Tables and Regions Rows, columns, column families, versions HBase Shell and Admin commands Lab : HBase Shell Section 3 : Accessing HBase using Java API Introduction to Java API Read / Write path Time Series data Scans Map Reduce Filters Counters Co-processors Labs (multiple) : Using HBase Java API to implement  time series , Map Reduce, Filters and counters. Section 4 : HBase schema Design : Group session students are presented with real world use cases students work in groups to come up with design solutions discuss / critique and learn from multiple designs Labs : implement a scenario in HBase Section 5 : HBase Internals Understanding HBase under the hood Memfile / HFile / WAL HDFS storage Compactions Splits Bloom Filters Caches Diagnostics Section 6 : HBase installation and configuration hardware selection install methods common configurations Lab : installing HBase Section 7 : HBase eco-system developing applications using HBase interacting with other Hadoop stack (MapReduce, Pig, Hive) frameworks around HBase advanced concepts (co-processors) Labs : writing HBase applications Section 8 : Monitoring And Best Practices monitoring tools and practices optimizing HBase HBase in the cloud real world use cases of HBase Labs : checking HBase vitals
417029 From Data to Decision with Big Data and Predictive Analytics 21 hours Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing. It is not aimed at people configuring the solution, those people will benefit from the big picture though. Delivery Mode During the course delegates will be presented with working examples of mostly open source technologies. Short lectures will be followed by presentation and simple exercises by the participants Content and Software used All software used is updated each time the course is run so we check the newest versions possible. It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning. Quick Overview Data Sources Minding Data Recommender systems Target Marketing Datatypes Structured vs unstructured Static vs streamed Attitudinal, behavioural and demographic data Data-driven vs user-driven analytics data validity Volume, velocity and variety of data Models Building models Statistical Models Machine learning Data Classification Clustering kGroups, k-means, nearest neighbours Ant colonies, birds flocking Predictive Models Decision trees Support vector machine Naive Bayes classification Neural networks Markov Model Regression Ensemble methods ROI Benefit/Cost ratio Cost of software Cost of development Potential benefits Building Models Data Preparation (MapReduce) Data cleansing Choosing methods Developing model Testing Model Model evaluation Model deployment and integration Overview of Open Source and commercial software Selection of R-project package Python libraries Hadoop and Mahout Selected Apache projects related to Big Data and Analytics Selected commercial solution Integration with existing software and data sources
464016 OpenStack Overview 7 hours The course is dedicated to IT engineers and architects who are looking for a solution to host private or public IaaS (Infrastructure as a Service) cloud. This is also great opportunity for IT managers to gain knowledge owerview about possibilities which could be enabled by OpenStack. Before You spend a lot of money on OpenStack implementation, You could consider all pros and cons by attending on our course. This topic is also avaliable as individual consultancy. Course goal: gaining basic knowledge regarding OpenStack Introduction: What is OpenStack? Foundations of Cloud Computing OpenStack vs VMware OpenStack evolution OpenStack distributions OpenStack releases OpenStack deployment solutions OpenStack competitors OpenStack Services: Underpinning services Keystone Glance Nova Neutron Cinder Horizon Swift Heat Ceilometer Trove Sahara Ironic Zaqar Manila Designate Barbican OpenStack Architecture: Node roles High availability Scalability Segregation Backup Monitoring Self service portal Interfaces Quotas Workflows Schedulers Migrations Load balancing Autoscaling Demonstration: How to download and execute RC files How to create an external network in Neutron How to upload an image to Glance How to create a new flavor in Nova How to update default Nova and Neutron quotas How to create a new tenant in Keystone How to create a new user in Keystone How to manage roles in Keystone How to create a tenant network in Neutron How to create a router in Neutron How to manage router’s interfaces in Neutron How to update security groups in Neutron How to upload RSA key-pair to the project How to allocate floating IPs to the project How to launch an instance from image in Nova How to associate floating IPs with instances How to create a new volume in Cinder How to attach the volume to the instance How to take a snapshot of the instance How to take a snapshot of the volume How to launch an instance from snapshot in Nova How to create a volume from snapshot in Cinder
417024 Apache Mahout for Developers 14 hours Audience Developers involved in projects that use machine learning with Apache Mahout. Format Hands on introduction to machine learning. The course is delivered in a lab format based on real world practical use cases. Implementing Recommendation Systems with Mahout Introduction to recommender systems Representing recommender data Making recommendation Optimizing recommendation Clustering Basics of clustering Data representation Clustering algorithms Clustering quality improvements Optimizing clustering implementation Application of clustering in real world Classification Basics of classification Classifier training Classifier quality improvements
464096 Big Data Architect 35 hours Day 1 - provides a high-level overview of essential Big Data topic areas. The module is divided into a series of sections, each of which is accompanied by a hands-on exercise. Day 2 - explores a range of topics that relate analysis practices and tools for Big Data environments. It does not get into implementation or programming details, but instead keeps coverage at a conceptual level, focusing on topics that enable participants to develop a comprehensive understanding of the common analysis functions and features offered by Big Data solutions. Day 3 - provides an overview of the fundamental and essential topic areas relating to Big Data solution platform architecture. It covers Big Data mechanisms required for the development of a Big Data solution platform and architectural options for assembling a data processing platform. Common scenarios are also presented to provide a basic understanding of how a Big Data solution platform is generally used.  Day 4 - builds upon Day 3 by exploring advanced topics relatng to Big Data solution platform architecture. In particular, different architectural layers that make up the Big Data solution platform are introduced and discussed, including data sources, data ingress, data storage, data processing and security.  Day 5 - covers a number of exercises and problems designed to test the delegates ability to apply knowledge of topics covered Day 3 and 4.  Day 1 - Fundamental Big Data Understanding Big Data Fundamental Terminology & Concepts Big Data Business & Technology Drivers Traditional Enterprise Technologies Related to Big Data Characteristics of Data in Big Data Environments Dataset Types in Big Data Environments Fundamental Analysis and Analytics Machine Learning Types Business Intelligence & Big Data Data Visualization & Big Data Big Data Adoption & Planning Considerations Day 2 - Big Data Analysis & Technology Concepts Big Data Analysis Lifecycle (from business case evaluation to data analysis and visualization) A/B Testing, Correlation Regression, Heat Maps Time Series Analysis Network Analysis Spatial Data Analysis Classification, Clustering Outlier Detection Filtering (including collaborative filtering & content-based filtering) Natural Language Processing Sentiment Analysis, Text Analytics File Systems & Distributed File Systems, NoSQL Distributed & Parallel Data Processing, Processing Workloads, Clusters Cloud Computing & Big Data Foundational Big Data Technology Mechanisms Day 3 - Fundamental Big Data Architecture New Big Data Mechanisms, including ... Security Engine Cluster Manager  Data Governance Manager Visualization Engine Productivity Portal Data Processing Architectural Models, including ... Shared-Everything and Shared-Nothing Architectures Enterprise Data Warehouse and Big Data Integration Approaches, including ... Series Parallel Big Data Appliance Data Virtualization Architectural Big Data Environments, including ... ETL  Analytics Engine Application Enrichment Cloud Computing & Big Data Architectural Considerations, including ... how Cloud Delivery and Deployment Models can be used to host and process Big Data Solutions Day 4 - Advanced Big Data Architecture Big Data Solution Architectural Layers including ... Data Sources, Data Ingress and Storage, Event Stream Processing and Complex Event Processing, Egress, Visualization and Utilization, Big Data Architecture and Security, Maintenance and Governance Big Data Solution Design Patterns, including ... Patterns pertaining to Data Ingress, Data Wrangling, Data Storage, Data Processing, Data Analysis, Data Egress, Data Visualization Big Data Architectural Compound Patterns Day 5 - Big Data Architecture Lab Incorporates a set of detailed exercises that require delegates to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data architecture technologies, mechanisms and techniques can be applied to solve problems in Big Data environments.
296689 Programming with Big Data in R 21 hours Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap
806637 Cassandra for Developers 21 hours This course will introduce Cassandra –  a popular NoSQL database.  It will cover Cassandra principles, architecture and data model.   Students will learn data modeling  in CQL (Cassandra Query Language) in hands-on, interactive labs.  This session also discusses Cassandra internals and some admin topics. Duration : 3 days Audience : Developers Section 1: Introduction to Big Data / NoSQL NoSQL overview CAP theorem When is NoSQL appropriate Columnar storage NoSQL ecosystem Section 2 : Cassandra Basics Design and architecture Cassandra nodes, clusters, datacenters Keyspaces, tables, rows and columns Partitioning, replication, tokens Quorum and consistency levels Labs : interacting with cassandra using CQLSH Section 3: Data Modeling – part 1 introduction to CQL CQL Datatypes creating keyspaces & tables Choosing columns and types Choosing primary keys Data layout for rows and columns Time to live (TTL) Querying with CQL CQL updates Collections (list / map / set) Labs : various data modeling exercises using CQL ; experimenting with queries and supported data types Section 4: Data Modeling – part 2 Creating and using secondary indexes composite keys (partition keys and clustering keys) Time series data Best practices for time series data Counters Lightweight transactions (LWT) Labs : creating and using indexes;  modeling time series data Section 5 : Data Modeling Labs  : Group design session multiple use cases from various domains are presented students work in groups to come up designs and models discuss various designs, analyze decisions Lab : implement one of the scenario Section 6: Cassandra drivers Introduction to Java driver CRUD (Create / Read / Update, Delete) operations using Java client Asynchronous queries Labs : using Java API for Cassandra Section 7 : Cassandra Internals understand Cassandra design under the hood sstables, memtables, commit log read path / write path caching vnodes Section 8: Administration Hardware selection Cassandra distributions Cassandra best practices (compaction, garbage collection,) troubleshooting tools and tips Lab : students install Cassandra, run benchmarks Section 9:  Bonus Lab (time permitting) Implement a music service like Pandora / Spotify on Cassandra
417006 Hadoop Administration on MapR 28 hours Audience: This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand. Big Data Overview: What is Big Data Why Big Data is gaining popularity Big Data Case Studies Big Data Characteristics Solutions to work on Big Data. Hadoop & Its components: What is Hadoop and what are its components. Hadoop Architecture and its characteristics of Data it can handle /Process. Brief on Hadoop History, companies using it and why they have started using it. Hadoop Frame work & its components- explained in detail. What is HDFS and Reads -Writes to Hadoop Distributed File System. How to Setup Hadoop Cluster in different modes- Stand- alone/Pseudo/Multi Node cluster. (This includes setting up a Hadoop cluster in VM BOX/VMware, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster). What is Map Reduce frame work and how it works. Running Map Reduce jobs on Hadoop cluster. Understanding Replication , Mirroring and Rack awareness in context of Hadoop clusters. Hadoop Cluster Planning: How to plan your hadoop cluster. Understanding hardware-software to plan your hadoop cluster. Understanding workloads and planning cluster to avoid failures and perform optimum. What is MapR and why MapR : Overview of MapR and its architecture. Understanding & working of MapR Control System, MapR Volumes , snapshots & Mirrors. Planning a cluster in context of MapR. Comparison of MapR with other distributions and Apache Hadoop. MapR installation and cluster deployment. Cluster Setup & Administration: Managing services, nodes ,snapshots, mirror volumes and remote clusters. Understanding and managing Nodes. Understanding of Hadoop components, Installing Hadoop components alongside MapR Services. Accessing Data on cluster including via NFS Managing services & nodes. Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing and monitoring metrics to monitor performance, configuring and administering MapR security. Understanding and working with M7- Native storage for MapR tables. Cluster configuration and tuning for optimum performance. Cluster upgrade and integration with other setups: Upgrading software version of MapR and types of upgrade. Configuring Mapr cluster to access HDFS cluster. Setting up MapR cluster on Amazon Elastic Mapreduce. All the above topics include Demonstrations and practice sessions for learners to have hands on experience of the technology.
238325 Big Data Hadoop Administration Training 21 hours Szkolenie pozwoli w pełni zapoznać się i zrozumieć wszystkie niezbędne kroki do obsługi i utrzymywania klastra Hadoop. Dostarcza wiedzę począwszy od zagadnień związanych ze specyfikacją sprzętu, instalacją i konfiguracją systemu, aż do zagadnien związanych z równoważeniem obciążenia, strojeniem, diagnozowaniem i rozwiązywaniu problemów  przy wdrożeniu. Kurs dedykowany administratorom, którzy będą tworzyć lub/i utrzymywać klaster Hadoop. Materiały szkoleniowe Materiały szkoleniowe Student Guide Materiały szkoleniowe Lab Guide Apache Hadoop i HDFS Ładowanie danych do Hadoop'a YARN i MapReduce Planowanie własnego klastra Instalacja i startowa konfiguracja klastra Hadoop Instalacja i konfiguracja Hive, Impala i Pig Klienci Hadoop Dystrybucje Hadoop i jaką wybrać dla siebie Zaawansowana konfiguracja klastra Bezpieczeństwo Hadoop Zarządzanie i cykliczne uruchamianie zadań Utrzymanie klastra Rozwiązywanie problemów i monitoring klastra Intergracja Hadoop'a z rozwiązaniami do integracji danych z narzędziami do integracji danych (np. SAS Data Integration Studio, Informatica PC, IBM Data Stage, Oracle Data Integrator, SQL Server Integration Services, Ablnitio)

Kursy ze Zniżką

Szkolenie Miejscowość Data Kursu Cena szkolenia [Zdalne/Stacjonarne]
Programowanie w WPF 4.5 Warszawa, ul. Złota 3/11 pon., 2016-09-05 09:00 2809PLN / 1805PLN
Java Spring Szczecin, ul. Małopolska 23 pon., 2016-09-05 09:00 7039PLN / 5044PLN
Tworzenie aplikacji internetowych w języku PHP Szczecin, ul. Małopolska 23 wt., 2016-09-06 09:00 2688PLN / 2081PLN
Building Web Apps using the MEAN stack Szczecin, ul. Małopolska 23 pon., 2016-09-12 09:00 4388PLN / 3003PLN
Java Spring Gdańsk, ul. Powstańców Warszawskich 45 pon., 2016-09-12 09:00 7039PLN / 5153PLN
Java Spring Poznań, Garbary 100/63 pon., 2016-09-12 09:00 7039PLN / 4961PLN
MS Access - poziom średniozaawansowany Bydgoszcz, ul. Dworcowa 94 wt., 2016-09-13 09:00 1218PLN / 910PLN
Zarządzanie konfliktem Bielsko-Biała, Al. Armii Krajowej 220 wt., 2016-09-13 09:00 2112PLN / 1315PLN
Java Spring Warszawa, ul. Złota 3/11 pon., 2016-09-19 09:00 7039PLN / 4961PLN
Java Performance Tuning Gdynia, ul. Ejsmonda 2 pon., 2016-09-19 09:00 4150PLN / 2866PLN
Java Spring Wrocław, ul.Ludwika Rydygiera 2a/22 pon., 2016-09-19 09:00 7039PLN / 4961PLN
Oracle 11g - Programowanie w PL/SQL II Wrocław, ul.Ludwika Rydygiera 2a/22 pon., 2016-09-26 09:00 2363PLN / 1785PLN
BPMN 2.0 dla Analityków Biznesowych Wrocław, ul.Ludwika Rydygiera 2a/22 wt., 2016-09-27 09:00 3110PLN / 2337PLN
ITIL® Foundation Certificate in IT Service Management Warszawa, ul. Złota 3/11 pon., 2016-10-10 09:00 2639PLN / 2076PLN
Visual Basic for Applications (VBA) w Excel - poziom zaawansowany Wrocław, ul.Ludwika Rydygiera 2a/22 pon., 2016-10-10 09:00 1689PLN / 1296PLN
Prognozowanie Rynku Poznań, Garbary 100/63 czw., 2016-10-13 09:00 2936PLN / 2112PLN
ITIL® Foundation Certificate in IT Service Management Łódź, ul. Tatrzańska 11 pon., 2016-10-17 09:00 2639PLN / 2160PLN
Microsoft Office Excel - efektywna praca z arkuszem Rzeszów, Plac Wolności 13 wt., 2016-10-18 09:00 918PLN / 843PLN
Wdrażanie efektywnych strategii cenowych Poznań, Garbary 100/63 śr., 2016-10-26 09:00 1427PLN / 1093PLN
Agile Project Management with Scrum Kraków, ul. Rzemieślnicza 1 śr., 2016-11-02 09:00 1746PLN / 1449PLN
Visual Basic for Applications (VBA) w Excel - poziom zaawansowany Białystok, ul. Malmeda 1 pon., 2016-11-14 09:00 1689PLN / 1413PLN
Programowanie w języku Python Szczecin, ul. Małopolska 23 wt., 2016-11-15 09:00 5790PLN / 3824PLN
Techniki graficzne (Adobe Photoshop, Adobe Illustrator) Wrocław, ul.Ludwika Rydygiera 2a/22 wt., 2016-12-06 09:00 1963PLN / 1470PLN

Najbliższe szkolenia

Szkolenie Big Data, Big Data boot camp, Szkolenia Zdalne Big Data, szkolenie wieczorowe Big Data, szkolenie weekendowe Big Data , kurs zdalny Big Data, kurs online Big Data, instruktor Big Data, wykładowca Big Data ,Kurs Big Data, nauka przez internet Big Data, Trener Big Data, e-learning Big Data,Kursy Big Data, nauczanie wirtualne Big Data, edukacja zdalna Big Data

Some of our clients