Retrieving GWAS information with R
I am trying to get specific disease-related information from the GWAS catalog. This can be done directly from the website via a spreadsheet download. But I was wondering if I could possibly do it programm开发者_Python百科atically in R. Any suggestions will be greatly appreciated.
Thanks.
Avoks
Checkout the function download.file() and the package rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - this should do what you are looking for
You will have to download .tsv file(s) first and manually edit them. This is because GWAS Catalog files contain HTML symbols, like § in "Behçet's disease" (defining that special fourth letter). The # in these symbols will be interpreted by R as an end of line, thus you will get an error message, e.g.:
line 2028 did not have 34 elements
So you downlad it first, open in plain text editor, automatically replace every # with empty character, and only then load it into R with:
read.table("gwas_catalog_v1.0-associations_e91_r2018-02-21.tsv",sep="\t",h=T,stringsAsFactors = F,quote="")
精彩评论