To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The relentless advance of DNA typing capabilities leads to complications and concerns. It is one thing for someone to be able to obtain your ABO blood type from a tiny spot of blood and another thing to know your eye color and ancestry. Some of the concerns are due to misconceptions, but others can pit personal privacy against perceived security. Practices and policies have not caught up with capability. We highlight a few of these current dilemmas in this chapter.
The last chapter discussed how peaks in an instrument output are converted into a DNA profile and how the random match probability is calculated using the product rule. Now we delve into how these profiles are analyzed and interpreted. Once a DNA profile has been developed from crime-scene evidence, it is compared to the profile(s) from known reference samples. These include elimination samples and samples from a person or persons of interest. If these comparisons do not provide helpful information, the profile can be submitted to a DNA database to search for investigative leads. Our focus is on DNA samples from a single person or simple mixtures such as a well-separated sample from a sexual assault case. Complex and low-level mixtures are much more challenging. We tackle those in the next chapter using the foundation we will build in this one.
In the last chapter, we discussed the first method (RFLP) used in DNA typing. This procedure targeted relatively long DNA fragments (VNTRs) containing many repeated units of a base pair sequence. We ended by noting that they were not amenable to automation and therefore not destined for widespread forensic applications. However, by the early 1990s, many factors coalesced to set the stage for a leap in DNA typing capabilities. For example, the forensic and legal community had adjusted to DNA evidence, and analysts had moved from serological techniques such as ABO to genetic typing utilizing multiple DNA markers. Additionally, researchers in molecular biology, including in genomic sequencing, had identified many shorter repeat sequences that exhibited variation among individuals in a population.
This chapter kicks off with an example for a simple program in Python. Through this example, we will delve right away into the syntax of this popular programming language. There will be quite a few technical details in this example, which are probably new to anyone with no prior Python experience. The rest of the chapter will then walk the reader briefly yet methodologically through the basic Pythonic syntax. By the end of this chapter, you will be able to write simple programs in Python to solve various problems of a computational nature.
A graph is defined by two sets. One set, the nodes, describes a collection of objects, for example biological species, such as human, whale, mountain lion, crow, pigeon, sea gull, Arabian surgeonfish, zebra fish, and Anemonefish. The second set, the edges, comprises pairs of nodes, such as {human, mountain lion}, {pigeon, sea gull}, {Arabian surgeonfish, Anemonefish}. Each pair represents a connection between objects, which has some meaning. For example, two species are related to one another if they belong to the same taxonomic family, or they both live on the same continent, or they are active in the same part of the day, and so on. Each such feature will create a different graph, because different properties create different sets of pairs (edges).
Computers and computer programs are harnessed these days to solve an extremely diverse range of problems. These problems include weather forecast, scheduling problems, designing, and controlling trajectories for satellites, recognizing individual human faces, playing chess, trading in the stock market, diagnosing cancer cells, assembling automobiles and airplanes, and driving autonomous cars (this is obviously a very partial list). In many cases, computers outperform human experts working on the same problem. In the future decades, we may expect computers to gradually replace medical doctors (completely or partially), express emotions, compose high quality music, etc. etc. Given these amazing fits, one wonders if computers are omnipotent: Can computers solve every problem, provided it is well defined?
In this chapter, we introduce the notion of graph algorithms, which are basically algorithms working on graphs. There are many such algorithms, aimed at solving a wide range of problems. We will focus on one such problem – the shortest paths problem. This problem has several algorithms, under different constraints, that solve it. We will present the well-known breadth first search (BFS), algorithm that solves a simple version of that problem. This algorithm will be explained in detail and implemented in Python. We will conclude the chapter by mentioning additional common problems in graph theory.
Much of the initial work in bioinformatics was centered around biological sequences of medium lengths, such as genes and protein sequences. Typical problems were finding common genomic motifs, e.g., conserved sequence motifs in the promoters’ region of a gene family, or looking for sequence similarity among proteins of different species. These tasks, and many others, have significant biological consequences. Yet the key to their solution lies in the so-called area of stringology, which is the computer science jargon for the study of string algorithms and properties.
Computers allow us to store an enormous amount of data. Data come in several types, such as numbers, sound, and images. But how are these types of data represented in the computer’s memory?
Modern biology has been undergoing a dramatic revolution in recent decades. Enormous amounts of data are produced at an unprecedented rate. These data come in various forms, such as gene and protein expression level data, DNA, RNA, and protein sequencing, high quality biological and medical images. One consequence of this “data explosion” is that computational methods are increasingly being used in life science research. Computational methods in this context are not the mere use of tools, but the integration of computational and algorithmic thinking to lab-experiment design; to data generation, integration, and analyses; and to modeling and simulation. It is becoming widely recognized that such thinking skills should be incorporated into the standard training of life scientists in the 21st century.
In this chapter, we will define Turing’s Halting Problem. We will then describe Turing’s proof that the Halting Problem cannot be solved by any computer, ever. This was the first problem proven to be, fundamentally, beyond the capabilities of any computer. This result subsequently served as a springboard to prove that a whole, infinite class of other problems are unsolvable as well.
This chapter begins with a definition of what a graph is. While the definition is rather abstract, it is general enough to make graph theory relevant and applicable to a wide range of systems, in a variety of contexts. We will also see how to represent graphs in Python (or any other programming language). Such a representation allows a computerized inspection of many graph properties, as well as implementing various algorithms on graphs. By the end of this chapter, you will be familiar with many basic notions in this field. The last two sections of this chapter introduce the notions of clusters and clustering, and of hierarchical clustering in the context of phylogenetic trees. We note that the treatment of these two topics here hardly touches their surface – both are deep subjects in their own right, covered by vast and diverse literature.
In the previous chapter, we studied well-defined computational problems that cannot be solved by any computer program. Consider now a setting where we are given a well-defined computational problem that is solvable by some computer program. Yet any such program takes a very long time to complete. Long enough that by the time the execution terminates, the solution is no longer relevant.
In this chapter, you will encounter one of the most fundamental notions in CS – complexity of computations. We take a rather non-formal approach in presenting this topic, with an emphasis on intuition. Complexity of algorithms has two flavors – time complexity, which is related to running time, and memory, or space complexity, which reflects the memory allocation requirements of an algorithm. In this chapter, and in most other chapters as well, we will focus primarily on time complexity (except for some cases, in which memory poses a real limitation). You are not expected to become an expert in analyzing the complexity of algorithms. However, you should be able, by the end of this chapter, to understand how running time depends on an algorithm’s input size and to estimate empirically an algorithm’s actual running time. Later in the book, in chapter 11 (Mission Infeasible), we will get acquainted with the notion of complexity from another perspective – computational problems that can be solved in principle, but for which no efficient solutions are known.