Search results for Genomics, bioinformatics and systems biology

10 - Classification with Random Forests
from Part III - Classification Methods
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 149-157
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 10 covers the random forests algorithm for classification. Presented are also the impurity metrics applicable to splitting nodes in classification trees (Gini, entropy, and misclassification impurity), as well as permutation-based and impurity-based variable importance measures.

Part V - Multivariate Biomarker Discovery Studies
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 219-266
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

12 - Discriminant Analysis
from Part III - Classification Methods
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 174-190
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 12 presents discriminant analysis – a classical (and powerful) supervised learning approach for classification. Discussed are Fisher’s discriminant analysis, as well as Gaussian linear, quadratic, and regularized discriminant analysis. The chapter concludes with a discussion of partial least squares discriminant analysis, which is still popular in some application areas, even if its application to high-dimensional data is likely to result in solutions that are suboptimal in terms of predictive abilities and interpretability (alternative approaches are recommended).

17 - Biomarker Discovery Study 2
from Part V - Multivariate Biomarker Discovery Studies
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 241-266
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 17 describes the second real-life study, whose goal is the identification of multivariate biomarkers for liver cancer. This study implements parallel recursive feature elimination experiments coupled with random forests and support vector machines. Included are also considerations for rebalancing class proportions. Three multivariate biomarkers for liver cancer have been identified. The study has been performed in an R environment, and R scripts for all of its steps are provided.

16 - Biomarker Discovery Study 1
from Part V - Multivariate Biomarker Discovery Studies
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 221-240
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapters 16 presents the first of the two real-life multivariate biomarker discovery studies included in the book. The goal of this study – which implements the method presented in Chapters 14 and 15 – is to identify the essential gene expression patterns and a multivariate biomarker common for multiple types of cancer. This study is based on the TCGA RNA-Seq data of 3,528 patients and 20,530 gene expression variables; the data represent five tumor types of five different tissues. A parsimonious multivariate biomarker (consisting of ten genes) with high sensitivity and specificity has been identified.

9 - Support Vector Regression
from Part II - Regression Methods for Estimation
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 136-146
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 9 presents support vector regression (SVR), a relatively newer supervised learning algorithm for predictive regression modeling, which – like random forests for regression – also may outperform the least-squares-based methods. Discussed is ε-insensitive loss used by SVR, the ε-tube concept, as well as algorithms for linear and nonlinear SVRs.

4 - Evaluation of Predictive Models
from Part I - Framework for Multivariate Biomarker Discovery
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 44-75
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 4 provides a detailed coverage of methods for the evaluation of predictive models: the methods applicable to regression models implementing estimation biomarkers, as well as methods evaluating binary and multiclass classification models. Discussion of resampling techniques is accompanied by accentuating the danger of information leakage and by emphasizing the paramount importance of avoiding internal validation. Discussion of metrics for the evaluation of classification biomarkers includes the issue of proper and improper interpretation of sensitivity and specificity, illustrated by an example of a screening biomarker targeting a population with low prevalence of the tested disease. For such biomarkers, positive predictive value may be unacceptably low even when the biomarker has a very high specificity and sensitivity. Discussed in this chapter are also misclassification costs and incorporating them into cost-sensitive classification.

Acknowledgments
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp xvii-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 273-276
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Copyright page
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp iv-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Multivariate Biomarker Discovery

Data Science Methods for Efficient Analysis of High-Dimensional Biomedical Data
Darius M. Dziuda
Published online:

30 May 2024

Print publication:

06 June 2024
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Multivariate biomarker discovery is increasingly important in the realm of biomedical research, and is poised to become a crucial facet of personalized medicine. This will prompt the demand for a myriad of novel biomarkers representing distinct 'omic' biosignatures, allowing selection and tailoring treatments to the various individual characteristics of a particular patient. This concise and self-contained book covers all aspects of predictive modeling for biomarker discovery based on high-dimensional data, as well as modern data science methods for identification of parsimonious and robust multivariate biomarkers for medical diagnosis, prognosis, and personalized medicine. It provides a detailed description of state-of-the-art methods for parallel multivariate feature selection and supervised learning algorithms for regression and classification, as well as methods for proper validation of multivariate biomarkers and predictive models implementing them. This is an invaluable resource for scientists and students interested in bioinformatics, data science, and related areas.

Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 143-144
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

14 - Haplotype analysis
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 333-343
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

With diploid organisms, one is interested not only in discovering variants but also in discovering to which of the two haplotypes each variant belongs. One would thus like to identify the variants that are co-located on the same haplotype, a process called haplotype phasing. Assume we have managed to do haplotype phasing for several individuals. It is then of interest to do haplotype matching, that is, to locate long contiguous blocks shared by multiple individuals. The chapter covers algorithms and complexity analysis of these key haplotype analysis tasks. A close connection between classical indexes and a tailored data structure called the positional BWT index is established.

7 - Hidden Markov models
from Part II - Fundamentals of Biological Sequence Analysis
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 129-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Analysing the content of a biological sequence can often be modeled as a segmentation problem. For example, one may wish to segment a genome in coding and non-coding regions, where only the former are translated to proteins. Statistical features of what genes usually look like can be used to derive an optimization framework. This process can be formalized through hidden Markov models, and the underlying segmentation problem can be solved using dynamic programming. This chapter introduces the key methods related to such optimization.

9 - Burrows–Wheeler indexes
from Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 174-216
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Classical index structures like suffix trees are powerful, but they occupy much more space than the data they are built on. Many space-efficient alternatives exist that occupy space close to the input data. This chapter covers such data structures based on the Burrows–Wheeler transform (BWT). A special emphasis is given to the bidirectional BWT index, which can be used for solving basic genome analysis tasks by simulating suffix tree exploration without any sacrifice in run time. Space-efficient representations of de Bruijn graphs are also covered.

4 - Graphs
from Part I - Preliminaries
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 42-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Graphs are a fundamental model for representing various relations among data. The aim of this chapter is to present some basic problems and techniques relating to graphs, mainly for finding particular paths in directed and undirected graphs. In the following chapters, we deal with various problems in biological sequence analysis that can be reduced to one of these basic ones.

16 - Transcriptomics
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 367-393
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we assume that we have a collection of reads from all the different (copies of the) transcripts of a gene. We start by showing how to extend read alignment techniques to short RNA reads, and later we show how to exploit the output of genome analysis techniques to obtain an aligner for long reads of RNA transcripts. Our final goal is to assemble the reads into the different RNA transcripts and to estimate the expression level of each transcript. The main difficulty of this problem, which we call multi-assembly, arises from the fact that the transcripts share identical substrings. We illustrate different scenarios, and corresponding multi-assembly formulations, which we then solve using network flow techniques.

Frontmatter
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part II - Fundamentals of Biological Sequence Analysis
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 81-82
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Classical indexes
from Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 145-173
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A full-text index for a string T is a data structure that is built once and that is kept in memory for answering an arbitrarily large number of queries on the position and frequency of substrings of T. Such queries can be used for speeding-up dynamic programming algorithms tailored for mapping reads to a reference genome – a fundamental task in the analysis of high-throughput sequencing data. This chapter covers the classical full-text indexes and the like, including k-mer indexes, suffix arrays, and suffix trees. Linear-time algorithms for suffix sorting and for basic genome analysis tasks, such as finding maximal exact matches, are also presented.

Genomics, bioinformatics and systems biology

Refine search

Refine search

Actions for selected content:

6077 results in Genomics, bioinformatics and systems biology

10 - Classification with Random Forests

Summary

Part V - Multivariate Biomarker Discovery Studies

12 - Discriminant Analysis

Summary

17 - Biomarker Discovery Study 2

Summary

16 - Biomarker Discovery Study 1

Summary

9 - Support Vector Regression

Summary

4 - Evaluation of Predictive Models

Summary

Acknowledgments

Index

Copyright page

Multivariate Biomarker Discovery

Part III - Genome-Scale Index Structures

14 - Haplotype analysis

Summary

7 - Hidden Markov models

Summary

9 - Burrows–Wheeler indexes

Summary

4 - Graphs

Summary

16 - Transcriptomics

Summary

Frontmatter

Part II - Fundamentals of Biological Sequence Analysis

8 - Classical indexes

Summary

Genomics, bioinformatics and systems biology

Refine search

Refine search

Actions for selected content:

Save Search

6077 results in Genomics, bioinformatics and systems biology

Summary

Summary

Summary

Summary

Summary

Summary

Multivariate Biomarker Discovery

Summary

Summary

Summary

Summary

Summary

Summary