perl Meteor server unicode problem
When trying to subscribe to a channel named "public" with Meteor i get the desired response:
<script>ch("public",0)</script>
When the http request is:
GET /push/131530959548387383/xhrinteractive/public?nc=1315309595740 HTTP/1.1
But when i try to subscribe to a channel named in english and hebrew such as: "tag-קוקאין" and the http request is:
GET /push/1315309516300999786/xhrinteractive/tag-%D7%A7%D7%95%D7%A7%D7%90%D7%99%D7%9F?nc=1315309516590 HTTP/1.1
I get an akward response - the name of the channel becomes dots:
<script>ch("tag-............", 0);</script>
So i was digging in Meteor files and came across Subscriber.pm file which is responsible for figuring the header and connect the end user to the right channel.
The part that checks the header using regex is:
开发者_运维问答 if($self->{'headerBuffer'}=~/GET\s+$::CONF{'SubscriberDynamicPageAddress'}\/([0-9a-z]+)\/([0-9a-z]+)\/([a-z0-9_\-\%\.\/]+).*?/i)
{
$self->{'subscriberID'}=$1;
$self->{'mode'}=$2;
my $persist=$self->getConf('Persist');
my $maxTime=$self->getConf('MaxTime');
$self->{'ConnectionTimeLimit'} = ($self->{'ConnectionStart'}+$maxTime) if ($maxTime>0);
my @channelData=split('/',$3);
my $channels={};
my $channelName;
my $offset;
foreach my $chandef (@channelData) {
if($chandef=~/^([a-z0-9_\-\%]+)(.(r|b|h)([0-9]*))?$/i) {
$channelName = $1;
$channels->{$channelName}->{'startIndex'} = undef;
if ($3) {
$offset = $4;
if ($3 eq 'r') { $channels->{$channelName}->{'startIndex'} = $offset; }
if ($3 eq 'b') { $channels->{$channelName}->{'startIndex'} = -$offset; }
if ($3 eq 'h') { $channels->{$channelName}->{'startIndex'} = 0; }
}
}
}
when "CONF{'SubscriberDynamicPageAddress'}" is "push";
There has to be a fix for this somewhere.. i tried searching everywhere with no avail. I will be grateful if someone will point me to some direction.
I have no idea of Meteor. Is it possible for you to change the part that checks the header?
The problem is that the software allows only ascii letters a to z (case independent because of the i
modifier at the end), which is completely out of date.
The unicode approach would check for \p{L}
instead of a-z
\p{L}
is a unicode code point with the property Letter in all languages.
See here on regular-expressions.info
But this does not explain the dots in the response, the non ascii letters must be replaced somewhere else.
OK, I also thought a bit about this. I think the problem is not the regex. The problem is that the unicode character encoding in URI's is not perfectly reliable (see wikipedia). It seems to me that Meteor at some point could not decode your hebrew characters and replaces them with dots.
精彩评论