Computer Science for Bioinformatics

Computer Science for Bioinformatics (140.636)

1st term, 2018-2019
MWF 1:30-2:20 in W5008
F 10:30-11:20 in W4013

Instructor: Fernando Pineda

Course materials

  1. Lecture and Lab notes
  2. Books, references,and Cheat Sheets

Description

Please read ALL of this document carefully.

This course uses multiple programming language to introduce skills and concepts needed to process and interpret data from high-throughput technologies in the biological sciences. The course focuses on generally applicable computer-science concepts rather than statistical or biological concepts.  Lectures with live computer demonstrations and hands-on-laboratories will be used to introduce key concepts. These will be reinforced and extended with weekly readings and programming exercises. Exercises and examples will draw heavily from biological sequence analysis,  proteomics, genetics and computational biology. Occasional guest lecturers will present case studies. Students will be introduced to the wealth of bioinformatics and computational software-development resources available on the World Wide Web. Students will be introduced to necessary fundamentals in computer science including: (1) Salient machine and network basics (2) data representation,  data structures, algorithms and complexity, (3)  parsing and pattern matching, (4) programming languages (HTML, Perl, Python, SQL, regular expressions), (5) style and best practices (6) Object-oriented programming.  Applied topics to be covered include: (1) Biological sequence analysis, (2) Middleware (3) how to use scripts in bash, perl and python to manage and process datasets, (4) Relational databases including automated interaction with local (e.g. SQLite and MySQL) and biological (e.g. Genbank) databases, (5) High performance computing, (6) parallel processing, and (7) simulation.

People

Name Role Contact/Location Office Hours
Fernando Pineda Instructor Tel: 443-287-3673
fernando.pineda@jhu.edu
office: E3626
by appointment
Mark Miller Computing Systems Manager mmil116@jhu.edu

Prerequisites

Permission of the instructor AND a previous course in computer science OR computer programming experience. If you have never done any programming or if you have never used a command line on a unix workstation, you will find this course to be VERY challenging,  If this is the case, you may wish to register as an auditor or pass/fail instead of taking the course for a grade.

Homework and Grading Policy

Grades are based on four-five programming assignments and a final project. The programming assignments count for 70% of the grade, the final project counts for 30% of the grade. It is expected that each student will coordinate with the instructor to select a suitable project based on the student’s interest. The final project must be presented in class and working code must be demonstrated. Homework problems are generally awarded 2-5 points each. Programming assignments typically receive full credit if they produce correct results when run by the instructor on the cluster and are well documented. It is not sufficient that that the code runs on the student’s machine. Documentation and programming style are necessarily somewhat subjective. Note that not everything needed to complete the assignments will be presented in the lectures. Assignments will need material in the readings as well as the lectures.

Homework is accepted electronically as html formatted documents AND as working programs/scripts. No homework will be accepted via email or on paper. No late homework will be accepted. Once the due date and time has passed, it will be impossible to submit homework electronically. (We will check the timestamps on the files!). For programming assignments students may discuss ideas and approaches with others. However, programs and projects are to be completed independently and must be original work. The first block of comments in your Perl code should contain, at a minimum, the following items.

* Name of the program
* Your name and the date
* Assignment number
* Usage instructions for the program

Final Project

You should have a meeting with the instructor to decide on a final project no later than four weeks before the end of the course. A written proposal (a paragraph or two describing the project) which is put on your final project web page is due three weeks before the end of the course. I find that the best projects are those that come from active research projects. So it is best to consult with your advisor, or a faculty member in your department for potential projects. A suitable project should be about as much work as two or three homework assignments. The final project is graded on how well is shows mastery of the subject matter taught in the course. For example a project that makes effective use of modules, data structures (e.g. references), regular expressions or databases, will get more points than a program that uses just rudimentary perl. Documentation and maintainability is also important. Note: A project that solves an interesting and useful problem will also get more points than a problem that is just a homework exercise (This is graduate school after all). Here are some example projects from previous years: FinalProjects

Schedule

Day
Month-Day
Venue
Topic
Remarks
Wed
Sep-05
W5008
Course Mechanics & Intro to the Basic BasicsBring a laptop
FriSep-07
W4013Account set up and Intro to the cluster

Bring a laptop
FriSep-07
W5008Basic Linux & Command line
MonSep-10
W5008Perl: Introduction and rationale
WedSep-12W5008Perl: Strings & Variables, lists and arrays
Fri
Sep-14
W4013Perl: hashes
Fri
Sep-14
W5008Perl: Control statements
MonSep-17
W5008Regular Expressions
Wed
Sep-19W5008Perl: Scope, functions & subroutines
FriSep-21W4013Perl: Scope & packages
FriSep-21W5008Perl: packages & Modules
MonSep-24W5008Data structures & Computational complexity
Wed
Sep-26W5008Perl: Dynamic Programming & Sequence alignment
FriSep-28W2033Perl: Object Oriented Programming
FriSep-28W5008Python: Introduction & Rationale
MonOct-01W5008Python
Wed
Oct-03W5008Python
FriOct-05W4013Python
FriOct-05W5008Relational Databases: Introduction, Rationale & SQL
MonOct-08W5008Relational Databases: SQL & Joins
WedOct-10W5008Relational Databases & interaction with perl and python
FriOct-12W4013Python: more about
Objects & Classes
FriOct-12W5008Python example: Agent-based simulation
MonOct-15W5008TBD
Wed
Oct-17W5008High Performance Computing
FriOct-19W4013High Performance Computing
FriOct-19W5008High Performance Computing
MonOct-22W5008TBD
WedOct-24W5008Student Presentations
FriOct-26W4013Student Presentations
FriOct-26W5008Student Presentations