PHP connecting to MediaWiki API and retrieve data
I noticed there was a question somewhat similar to mine, only with c#:link text. Let me explain: I'm very new to the whole web-services implementation and so I'm experiencing some difficulty understanding (especially due to the vague MediaWiki API manual).
I want to retrieve the entire page as a string in PHP (XML file) and then process it in PHP (I'm pretty sure there a开发者_开发技巧re other more sophisticated ways to parse XML files but whatever): Main Page wikipedia.
I tried doing $fp = fopen($url,'r');
. It outputs: HTTP request failed! HTTP/1.0 400 Bad Request
. The API does not require a key to connect to it.
Can you describe in detail how to connect to the API and get the page as a string?
EDIT:
The URL is $url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main Page';
. I simply want to read the entire content of the file into a string to use it.
Connecting to that API is as simple as retrieving the file,
fopen
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$fp = fopen($url, 'r');
while (!feof($fp)) {
$c .= fread($fp, 8192);
}
echo $c;
file_get_contents
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$c = file_get_contents($url);
echo $c;
The above two can only be used if your server has the fopen
wrappers enabled.
Otherwise if your server has cURL installed you can use that,
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$c = curl_exec($ch);
echo $c;
You probably need to urlencode the parameters that you are passing in the query string ; here, at least the "Main Page
" requires encoding -- without this encoding, I get a 400 error too.
If you try this, it should work better (note the space is replaced by %20
) :
$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$str = file_get_contents($url);
var_dump($str);
With this, I'm getting the content of the page.
A solution is to use urlencode
, so you don't have to encode yourself :
$url='http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=' . urlencode('Main Page');
$str = file_get_contents($url);
var_dump($str);
According to the MediaWiki API docs, if you don't specify a User-Agent in your PHP request, WikiMedia will refuse the connection with a 4xx HTTP response code:
https://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client
You might try updating your code to add that request header, or change the default setting in php.ini if you have edit access to that.
精彩评论