How to extract all the IUPAC names mentioned in the data available from Pubchem(NCBI) into a text file?

2023-04-09 02:41 问答作者：

I want to build lists of prefixes and suffixes of some length from all the IUPAC names mentioned in Pubchem Database,so that I can use them fu开发者_Python百科rther in my project as a feature.So I want all the IUPAC chemical names in a text file or in some format where I can extract these lists.

                         Thanks.

Sounds you need something like this Nist species list

You can search for most also in the Webbook but I failed to find a download link for the complete set.

In our lab we got a Cd(?) with the mass spectral database which contained the (complete? - well it got like 250.000 substances) database as text file. Maybe you can get that through some of the vendors.

The pubchem site offers you to download a dump of their data by ftp. Why not use that?

PubChem data can be downloaded via ftp from the PubChem site. A complete description of the available data can be obtained here: https://pubchemdocs.ncbi.nlm.nih.gov/downloads

Of particular interest for the question of IUPAC names, the data are downloadable from the "Compound Extras" section of the ftp site: ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/

The README-Extras file in this location describes the data in detail. For the IUPAC names, the following information is provided:

CID-IUPAC.gz:

This is a listing of all CIDs with their computed IUPAC names. It is a gzipped text file with CID, tab, IUPAC on each line. Note that the names may contain UTF8 characters.

A download today (23-Apr-2020) contains 102,586,778 rows. An excerpt of the information is shown below.

> head CID-IUPAC
1       3-acetyloxy-4-(trimethylazaniumyl)butanoate
2       (2-acetyloxy-3-carboxypropyl)-trimethylazanium
3       5,6-dihydroxycyclohexa-1,3-diene-1-carboxylic acid
4       1-aminopropan-2-ol
5       (3-amino-2-oxopropyl) dihydrogen phosphate
6       1-chloro-2,4-dinitrobenzene
7       9-ethylpurin-6-amine
8       2,3-dihydroxy-3-methylpentanoic acid
9       (2,3,4,5,6-pentahydroxycyclohexyl) dihydrogen phosphate
11      1,2-dichloroethane

继续阅读：artificial-intelligence database machine-learning pubchem

How to extract all the IUPAC names mentioned in the data available from Pubchem(NCBI) into a text file?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？