Advice/Tips on what the best way to spider/crawl/collect audio content from the internet

2023-01-30 11:21 问答作者：

well what I'm actually trying to do is to figure out how BEEMP3.COM works.

Because of the site's speed, I doubt they scrape other sites/sources on the spot. They probably use some sort of database (PostgreSQL or MySQL) to store the "results" and then just query the search terms.

My question is how do you guys think they crawl/spider or actually get the mp3 files/content? They must have some algori开发者_如何学编程thm to spider the internet OR use google's index of mp3 trick to find hosts with the raw mp3 files.

Any comments and tips or ideas are appreciated :)

QueryPath is a great tool for building a web spider.

I'm guessing they find MP3s using a combination approach - they have a list of "seed sites" (gathered from Google, Usenet or manually inserted) that they use as a starting points for the search and then set spiders running against them.

You need to write a script that will:

Take a webpage as a starting point
Fetch the webpage data (use cURL)
Use a regular expression to extract (a) any links (b) any links to mp3 files
Place any MP3 links into a database
Add the list of links to other webpages to a queue for processing through the above method

You'll also need to re-check your MP3 links regularly to erase any bad links.

Alternatively you can crawl MP3 spiders like beemp3.com and extract all direct download links and save them to your data base. you need only two file I. Simple html Dom. II. An application that can take extracted links to your database.

Check what i did in http://kenyaforums.com/bongomp3_external_link_search_engine_at_kenyaforums_com.php

You keep on asking in case of any contradiction.

继续阅读：indexing mp3 php web-crawler

Advice/Tips on what the best way to spider/crawl/collect audio content from the internet

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？