Looking for an Open Source Web Crawler that can crawl API requests and parse XML into csv [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this questionI'm looking into webcrawlers to crawl through an API and parse the xml into an xml or csv file.
I've been playing around with requests from some API feeds but it would be great if I didnt have to do it manually and use something to do it automatically and edit the data later.
For example using the API for a site called eventful, I can request an "?xml feed?" of data
http://api.eventful.com/rest/events/search?app_key=LksBn开发者_C百科C8MgTjD4Wc5&location=pittsburgh&date=Future
If you inspect the link you can see there is a ton of xml data sent back.
I thought that since the xml data is already broken down by elements it wouldn't be as difficult to ask the crawler to handle the sorting (e.g the city element would send all data to a city field in the csv document)
I'm wondering if anyone has used an existing opensource web crawler to crawl APIs and relate that parsed data into a excel like format....
I looked into Nutch but i couldnt find any reference in the documentation to sorting an xml return into a excel like document based on the elements returned by the API feed.
Has anyone done anything like this before and can you refer a program. Specifics would be really helpful.
We at http://import.io/ have a free solution similar to mozenda, you build the API using our web browser and then you can upload the API to our servers and use it for free. We also offer a crawler and various other features. Check it out and see what you think :)
P.S I work for import.io if you didn't get that already.
I found a paid solution called Mozenda.....
I'll update if I can find something opensource
精彩评论