Kod kursu
rintrob
Czas trwania
28 godzin
szkolenie zdalne: 4 lub 8 dni
szkolenie stacjonarne: 4 dni
Opis
R to darmowy język programowania open source do obliczeń statystycznych, analizy danych i grafiki. R jest używany przez coraz większą liczbę menedżerów i analityków danych w korporacjach i środowiskach akademickich. R znalazł również zwolenników wśród statystyków, inżynierów i naukowców bez umiejętności programowania komputerowego, którzy uważają go za łatwy w użyciu. Jego popularność wynika z coraz częstszego wykorzystywania eksploracji danych do różnych celów, takich jak ustalanie cen reklam, szybsze znajdowanie nowych leków lub dostosowywanie modeli finansowych. R ma szeroką gamę pakietów do eksploracji danych.
Machine Translated
Plan Szkolenia
I. Introduction and preliminaries
1. Overview
- Making R more friendly, R and available GUIs
- Rstudio
- Related software and documentation
- R and statistics
- Using R interactively
- An introductory session
- Getting help with functions and features
- R commands, case sensitivity, etc.
- Recall and correction of previous commands
- Executing commands from or diverting output to a file
- Data permanency and removing objects
- Good programming practice: Self-contained scripts, good readability e.g. structured scripts, documentation, markdown
- installing packages; CRAN and Bioconductor
2. Reading data
- Txt files (read.delim)
- CSV files
3. Simple manipulations; numbers and vectors + arrays
- Vectors and assignment
- Vector arithmetic
- Generating regular sequences
- Logical vectors
- Missing values
- Character vectors
- Index vectors; selecting and modifying subsets of a data set
- Arrays
- Array indexing. Subsections of an array
- Index matrices
- The array() function + simple operations on arrays e.g. multiplication, transposition
- Other types of objects
4. Lists and data frames
- Lists
- Constructing and modifying lists
- Concatenating lists
- Data frames
- Making data frames
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
5. Data manipulation
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding, transformations
- Aggregation, combining data sets
- Forming partitioned matrices, cbind() and rbind()
- The concatenation function, (), with arrays
- Character manipulation, stringr package
- short intro into grep and regexpr
6. More on Reading data
- XLS, XLSX files
- readr and readxl packages
- SPSS, SAS, Stata,… and other formats data
- Exporting data to txt, csv and other formats
6. Grouping, loops and conditional execution
- Grouped expressions
- Control statements
- Conditional execution: if statements
- Repetitive execution: for loops, repeat and while
- intro into apply, lapply, sapply, tapply
7. Functions
- Creating functions
- Optional arguments and default values
- Variable number of arguments
- Scope and its consequences
8. Simple graphics in R
- Creating a Graph
- Density Plots
- Dot Plots
- Bar Plots
- Line Charts
- Pie Charts
- Boxplots
- Scatter Plots
- Combining Plots
II. Statistical analysis in R
1. Probability distributions
- R as a set of statistical tables
- Examining the distribution of a set of data
2. Testing of Hypotheses
- Tests about a Population Mean
- Likelihood Ratio Test
- One- and two-sample tests
- Chi-Square Goodness-of-Fit Test
- Kolmogorov-Smirnov One-Sample Statistic
- Wilcoxon Signed-Rank Test
- Two-Sample Test
- Wilcoxon Rank Sum Test
- Mann-Whitney Test
- Kolmogorov-Smirnov Test
3. Multiple Testing of Hypotheses
- Type I Error and FDR
- ROC curves and AUC
- Multiple Testing Procedures (BH, Bonferroni etc.)
4. Linear regression models
- Generic functions for extracting model information
- Updating fitted models
- Generalized linear models
- Families
- The glm() function
- Classification
- Logistic Regression
- Linear Discriminant Analysis
- Unsupervised learning
- Principal Components Analysis
- Clustering Methods(k-means, hierarchical clustering, k-medoids)
5. Survival analysis (survival package)
- Survival objects in r
- Kaplan-Meier estimate, log-rank test, parametric regression
- Confidence bands
- Censored (interval censored) data analysis
- Cox PH models, constant covariates
- Cox PH models, time-dependent covariates
- Simulation: Model comparison (Comparing regression models)
6. Analysis of Variance
- One-Way ANOVA
- Two-Way Classification of ANOVA
- MANOVA
III. Worked problems in bioinformatics
- Short introduction to limma package
- Microarray data analysis workflow
- Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
- Data processing (QC, normalisation, differential expression)
- Volcano plot
- Custering examples + heatmaps