Data structures for bioinformatics [closed]

2023-01-27 19:53 问答作者：

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. 开发者_运维百科

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 5 years ago.

Improve this question

What are some data structures that should be known by somebody involved in bioinformatics? I guess that anyone is supposed to know about lists, hashes, balanced trees, etc., but I expect that there are domain specific data structures. Is there any book devoted to this subject?

The most fundamental data structure used in bioinformatics is string. There are also a whole range of different data structures representing strings. And algorithms like string matching are based on the efficient representation/data structures.

A comprehensive work on this is Dan Gusfield's Algorithms on Strings, Trees and Sequences

A lot of introductory books on bioinformatics will cover some of the basic structures you'd use. I'm not sure what the standard textbook is, but I'm sure you can find that. It might be useful to look at some of the language-specific books:

Bioinformatics Programming With Python
Beginning Perl for Bioinformatics

I chose those two as examples because they're published by O'Reilly, which, in my experience, publishes good quality books.

I just so happen to have the Python book on my hard drive, and a great deal of it talks about processing strings for bioinformatics using Python. It doesn't seem like bioinformatics uses any fancy special data structures, just existing ones.

Spatial hashing datastructures (kd-tree) for example are used often for nearest neighbor queries of arbitrary feature vectors as well as 3d protein structure analysis.

Best book for your $$ is Understanding Bioinformatics by Zvelebil because it covers everything from sequence analysis to structure comparison.

In addition to basic familiarity with the structures you mentioned, suffix trees (and suffix arrays), de Bruijn graphs, and interval graphs are used extensively. The Handbook of Computational Molecular Biology is very well written. I've never read the whole thing, but I've used it as a reference.

I also highly recommend this book, http://www.comp.nus.edu.sg/~ksung/algo_in_bioinfo/

And more recently, python is much more frequently used in bioinformatics than perl. So I really suggest you start with python, it is widely used in my projects.

Many projects in bioinformatics involve combining information from different, semi-structured sources. RDF and ontologies are essential for much of this. See, for example, the bio2RDF project. http://bio2rdf.org/. A good understanding of identifiers is valuable.

Much bioinformatics is exploratory and rapid lightweight tools are often used. See workflow tools such as Taverna where the primary resource is often a set of web services - so HTTP/REST are common.

Whatever your mathematical or computational expertise is, you are likely to find an application in computational biology. If not, make this another question of stackoverflow and you'll be helped :o)

As mentioned in the other answers, somewhat timeless are string comparisons and pattern discovery in 1-dimensional data since sequences are so easy to get. With a renewed interest in medical informatics though you also have two/three-dimensional image analysis that you run e.g. against genomic data. With molecular biochemistry you also have pattern searches on 3D surfaces and molecular simulations. To study drug effects you will work with gene networks and compare those across tissues. Typical challenges for big data and information integration apply. And then, you need statistical descriptions of the likelihood of a pattern or the clinical association of any features identified to be found by chance.

继续阅读：bioinformatics data-structures

Data structures for bioinformatics [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？