开发者

How to extract all the IUPAC names mentioned in the data available from Pubchem(NCBI) into a text file?

I want to build lists of prefixes and suffixes of some length from all the IUPAC names mentioned in Pubchem Database,so that I can use them fu开发者_Python百科rther in my project as a feature.So I want all the IUPAC chemical names in a text file or in some format where I can extract these lists.

                         Thanks. 


Sounds you need something like this Nist species list

You can search for most also in the Webbook but I failed to find a download link for the complete set.

In our lab we got a Cd(?) with the mass spectral database which contained the (complete? - well it got like 250.000 substances) database as text file. Maybe you can get that through some of the vendors.


The pubchem site offers you to download a dump of their data by ftp. Why not use that?


PubChem data can be downloaded via ftp from the PubChem site. A complete description of the available data can be obtained here: https://pubchemdocs.ncbi.nlm.nih.gov/downloads

Of particular interest for the question of IUPAC names, the data are downloadable from the "Compound Extras" section of the ftp site: ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/

The README-Extras file in this location describes the data in detail. For the IUPAC names, the following information is provided:

CID-IUPAC.gz:

This is a listing of all CIDs with their computed IUPAC names. It is a gzipped text file with CID, tab, IUPAC on each line. Note that the names may contain UTF8 characters.

A download today (23-Apr-2020) contains 102,586,778 rows. An excerpt of the information is shown below.

> head CID-IUPAC
1       3-acetyloxy-4-(trimethylazaniumyl)butanoate
2       (2-acetyloxy-3-carboxypropyl)-trimethylazanium
3       5,6-dihydroxycyclohexa-1,3-diene-1-carboxylic acid
4       1-aminopropan-2-ol
5       (3-amino-2-oxopropyl) dihydrogen phosphate
6       1-chloro-2,4-dinitrobenzene
7       9-ethylpurin-6-amine
8       2,3-dihydroxy-3-methylpentanoic acid
9       (2,3,4,5,6-pentahydroxycyclohexyl) dihydrogen phosphate
11      1,2-dichloroethane
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜