开发者

Why does the web page I fetch with Perl look odd?

I have a Perl script to open the page http://svejo.net/popular/all/new/ and filter the names of the posts, but except headers, everything seems encrypted. Nothing can be read.

When I open the same page in a browser everything looks fine, including the source code. How is it possible to encrypt a p开发者_运维百科age for a script and not for a browser? My Perl script sends the same headers as my browser (Google Chrome).


The page looks fine to me, although I don't read Bulgarian.

#!perl

use LWP::Simple;

getprint( 'http://svejo.net/popular/all/new/' );

This script returns the plain page without anything that looks odd or encrypted:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="bg" lang="bg">
  <head>

<title>Svejo — Популярните новини </title>

What were you trying, and which versions of perl and the modules are you using? What is the output that you are seeing?

You clarify that you are using ActivePerl on Windows (please update your question with additional details). Remember, not only do you need to do the right Unicode things in your programs, but your terminal has to be set up to display Unicode properly.


What happens when you explicitly binmode your output?

 binmode STDOUT, ':utf8';

Try saving the output to a file and looking at it in an editor that understands UTF-8.


Okay, that didn't work. Let's get even more general and set all handles to use UTF-8 by default:

  use open IO  => ':utf8';


The page is encoded with UTF-8. Perhaps your Perl script is using a different encoding?

I found this page that describes Processing UTF-8 Files with Perl.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜