Course Description

Rapid developments in bio- and information- technology and are changing the way that biomedical scientists interact with data. Traditionally, data were the end result of laborious experimentation, and their interpretation mostly involved careful thought and background knowledge. Today, data are increasingly generated much earlier in the scientific workflow and are much larger in scale. Also, before the data can be interpreted, extensive computational processing is often necessary. Thus, the data deluge in biomedicine now requires mining and modeling on a large scale - ie biomedical data science.

This course aims to equip students with some of the concepts and skills relevant to biomedical data science, with an emphasis on bioinformatics, a sub-discipline of this broader field, through examples of mining and modeling of genomic and proteomic data. More specifically, bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, mining of functional genomics data sets, and machine learning approaches for data integration.

Course Survey

If you are taking the class, please fill this out by the first day of class (Jan 18th):

Overall Flow of the Class

(Module = Group of Lectures)


Discussion Section

Session Time Location
Section 1 Thurs 1:00-2:00 PM YSB352
Section 2 Fri 10:00-11:00 AM BASS405
Section 3 Fri 10:00-11:00 AM YSB352
Section 4 Fri 1:00-2:00 PM YSB352

Different headings for this class (5 variants)


The course is keyed towards CBB graduate students as well as advanced undergraduates and graduate students wishing to learn about types of large-scale quantitative analysis that whole-genome sequencing and forms of large-scale biological data will make possible. It would also be suitable for students from other fields such as computer science, statistics or physics wanting to learn about an important new biological application for computation.

Students should have:

These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

Class materials

There is no text book for this class. PPT slides will be available after the lectures. We recommend Biochemistry by Lubert Stryer for biochemistry prerequisite.

Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

In-class tests: Quiz

Quizes will comprise simple questions that you should be able to answer from the lectures plus the main readings.

For references, please refer the previous Quiz Archive

Programming Assignments (Req’d for CBB and CS grad. students)

Non-programming Assignments

Pages from previous years

Class data dump