INTD 8065 - Data Analysis in Cancer Research

Syllabus

Coordinator

| Name | Humberto Ortiz-Zuazaga | | Office | NCL A-159 | | Laboratory | NCL A-158 | | Telephone | 787-764-0000 x7430 | | email | humberto.ortiz@upr.edu | | Web page | http://ccom.uprrp.edu/~humberto/ | | Office hours | Monday, Wednesday 8:00-9:30 AM | | | Tuesday 3:30-5:00 PM | | | or by appointment |

Description

Through lectures, group discussions and other active learning strategies, this course will introduce the students to statistical and computing methods for observational studies and clinical trials. The students will acquire knowledge in basic concepts of Data Analysis, utilizing Biostatistics and Bioinformatics tools, and applying these to biomedical/translational cancer and population sciences research.

Pre-requisites

  • Calculus I or equivalent

  • Basic course in statistics or biostatistics.

Objectives

At the end of the course, students will be able to:

  1. Apply the basic methods of data analysis and the elementary concepts of Biostatistics and Bioinformatics in applications related to Cancer Research.

  2. Use powerful and flexible software (such as R and BioConductor software packages) to effectively put into practice data analytic techniques in Cancer Research and be able to develop inferences.

  3. Develop case studies utilizing cancer data.

Course schedule

Class will meet Fridays from 1:00 to 3:30 PM in the Medical Sciences Campus, Nursing Professions Building, Room 318.

Tentative course calendar

Date Topic Book Chapter * Additional Reading
Jan 22 Introduction STAT Ch 1-2, DATA Pg 4-17
Jan 29 How to Generate Descriptive Statistics STAT Ch 4, DATA Pg 92-137 Spriestersbach, A., Röhrig, B., du Prel, J.B., Gerhold-Ay, A., Blettner, M. (2009). Descriptive statistics: The specification of statistical measures and their presentation in tables and graphs. Part 7 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 106(36):578-83.
Feb 5 How to Generate Inferential Statistics DATA Pg 18-91
Feb 12 Testing Hypotheses STAT Ch 5; DATA Pg 18-91 du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. (2010). Choosing statistical tests: Part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(19):343-8.
Feb 19 Calculation of Sample Size and Power STAT Ch 9; DATA Pg 62-74 Röhrig, B., du Prel, J.B., Wachtlin, D., Kwiecien, R., Blettner, M. (2010). Sample size calculation in clinical trials: Part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(31-32):552-6.
Feb 26 Linear Models STAT Ch 6,11 Schneider, A., Hommel, G., Blettner, M. (2010). Linear regression analysis: Part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(44):776-82.
Mar 4 One Way Analysis of Variance STAT Ch 7 Victor, A., Elsässer, A., Hommel, G., Blettner, M. (2010). Judging a plethora of p-values: how to contend with the problem of multiple testing--Part 10 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(4):50-6.
Mar 11 Contingency Tables and Log-linear Models I STAT Ch 13,15
Mar 18 Contingency Tables and Log-linear Models II STAT Ch 13,15
Mar 25 No class
Apr 1 Introduction to Bioinformatics W. Huber, V.J. Carey, R. Gentleman, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods, 2015:12, 115. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4509590/
Apr 8 Single and Multiple Sequence Alignment (1) Needleman,S. and Wunsch,C. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 443-453, 1970. http://www.cise.ufl.edu/class/cis4930sp09rab/00052.pdf (2) Erik S. Wright. The Art of Multiple Sequence Alignment in R. University of Wisconsin. Madison, WI. October 13, 2015. https://www.bioconductor.org/packages/3.3/bioc/vignettes/DECIPHER/inst/doc/ArtOfAlignmentInR.pdf
Apr 15 Statistical Methods for Analysis of Microarray Data Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf
Apr 22 Gene Clustering Analysis K. S. Pollard and M. J. van der Laan. "Cluster Analysis of Genomic Data" in Gentleman, R., Carey, V.J., et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005. http://cbcb.umd.edu/~hcorrada/CFG/readings/Solutions_ch13.pdf
Apr 29 Next Generation Sequencing Law, CW, Chen, Y, Shi, W, and Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. http://www.genomebiology.com/2014/15/2/R29
May 6 Bioinformatics Case Study

* STAT: Book chapters from P. Daalgard (2008) “Introductory Statistics with R” Springer.

* DATA: R. Irizarry and M. Love (2015) “Data Analysis for the Life Sciences” Leanpub. https://leanpub.com/dataanalysisforthelifesciences.

Instructional resources

The course will be hosted in Google Classroom. You need to sign in with a @upr.edu account.

Textbook

See the references and course calendar for course materials.

Software

We will use the R statistical software and bioconductor. Both are open source, and you can install them on your own computers.

Evaluation

Students work will be evaluated on a 100% basis with the standard curve.

  • Homework, 20% final grade
  • Project in Biostatistics, 35% final grade
  • Project in Bioinformatics, 35% final grade
  • Attendance, 10% final grade

Reasonable accomodations for students (Statement of PR Law 51)

Students with a health condition or situation that, according to the law, makes them eligible for reasonable accommodation have the right to submit a written application to the Dean, Associate Dean or Assistant Dean for Students Affairs of their Faculty, according to the procedures established in the document Submission Process for Reasonable Accommodation of the Medical Sciences Campus. This document may be obtained at the Deanship for Students Affairs of the MSC, at each Faculty, and the MSC web page. The application does not exempt students from complying with the academic requirements pertaining to the programs.

Academic integrity

The University of Puerto Rico promotes the highest standards of academic and scientific integrity. Article 6.2 of the UPR Student Bylaws (Certification JS 13 2009–2010) states that “academic dishonesty includes but is not limited to: fraudulent actions, obtaining grades or academic degrees using false or fraudulent simulations, copying totally or partially academic work from another person, plagiarizing totally or partially the work of another person, copying totally or partially responses from another person to examination questions, making another person to take any test, oral or written examination on his/hers behalf, as well as assisting or facilitating any person to incur in the aforementioned conduct”. Fraudulent conduct refers to “behavior with the intent to defraud, including but not limited to, malicious alteration or falsification of grades, records, identification cards or other official documents of the UPR or any other institution.” Any of these actions shall be subject to disciplinary sanctions in accordance with the disciplinary procedure, as stated in the existing UPR Student Bylaws.

DISCLAIMER: The above statement is an English translation, prepared at the Deanship of Academic Affairs of the Medical Sciences Campus, of certain parts of Article 6.2 of the UPR Student Bylaws “Reglamento General de Estudiantes de la Universidad de Puerto Rico”, (Certificación JS 13 2009-2010). It is in no way intended to be a legal substitute for the original document, written in Spanish.

Grading system

The grading system for this course is as follows:

90 - 100 A

80 - 89 B

70 - 79 C

Less than 70 F

Bibliography

  1. Daalgard, P. (2008). Introductory Statistics with R. (2nd ed). Springer.

  2. Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S. (Eds.). (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer.

  3. Irizarry, R and Love, M. (2015). Data Analysis for the Life Sciences. Leanpub.

  4. Spriestersbach, A., Röhrig, B., du Prel, J.B., Gerhold-Ay, A., Blettner, M. (2009). Descriptive statistics: The specification of statistical measures and their presentation in tables and graphs. Part 7 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 106(36):578-83.

  5. du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. (2010). Choosing statistical tests: Part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(19):343-8.

  6. Röhrig, B., du Prel, J.B., Wachtlin, D., Kwiecien, R., Blettner, M. (2010). Sample size calculation in clinical trials: Part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(31-32):552-6.

  7. Victor, A., Elsässer, A., Hommel, G., Blettner, M. (2010). Judging a plethora of p-values: how to contend with the problem of multiple testing--Part 10 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(4):50-6.

  8. Schneider, A., Hommel, G., Blettner, M. (2010). Linear regression analysis: Part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int, 107(44):776-82.

Electronic resources

For each chapter, the book provides links to the R code used.