How To Aggregate API Data?

2022-12-19 21:12 问答作者：

I have a system that connects to 2 popular APIs. I need to ag开发者_开发技巧gregate the data from each into a unified result that can then be paginated. The scope of the project means that the system could end up supporting 10's of APIs.

Each API imposes a max limit of 50 results per request.

What is the best way of aggregating this data so that it is reliable i.e ordered, no duplicates etc

I am using CakePHP framework on a LAMP environment, however, I think this question relates to all programming languages.

My approach so far is to query the search API of each provider and then populate a MySQL table. From this the results are ordered, paginated etc. However, my concern is performance: API communication, parsing, inserting and then reading all in one execution.

Am I missing something, does anyone have any other ideas? I'm sure this is a common problem with many alternative solutions.

Any help would be greatly appreciated.

Yes, this is a common problem.

Search SO for questions like https://stackoverflow.com/search?q=%5Bphp%5D+background+processing

Everyone who tries this realizes that calling other sites for data is slow. The first one or two seem quick, but other sites break (and your app breaks) and other sites are slow (and your app is slow)

You have to disconnect the front-end from the back-end.

Choice 1 - pre-query the data with a background process that simply gets and loads the database.

Choice 2 - start a long-running background process and check back from a JavaScript function to see if it's done yet.

Choice 3 - the user's initial request spawns the background process -- you then email them a link so they can return when the job is done.

i have a site doing just that with over 100 rss/atom feeds, this is what i do:

i have a list of feeds and a cron job that iterates over them, about 5 feeds a minute, meaning i cycle through all feeds every 20 minute or so.
i lift the feed, and try to insert each entry into the database, using the url as a unique field, if the url exists, i do not insert. the entry date is my current system clock, and is inserted by my application, as date fields in rss cannot be trusted, and in some cases, can't even be parsed.
for some feeds, and only experiece can tell you which, i also search for duplicate titles, some websites change the urls for their own reasons.
the items are now all placed in the same database table, ready to be queried.

one last thought: if your application is likely to have new feeds added while in production, you really should also check if a feed is "new" (ie: has no previous entries in the db), if it is, you should mark all currently available links as inactive, otherwise, when you add a feed, there will be a block of articles from that feed, all with the same date and time. (simply put: the method i described is for future additions to the feed only, past articles will not be available).

hope this helps.

继续阅读：aggregation cakephp php

How To Aggregate API Data?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？