TFramedTransport Error on PHPCassa + Cassandra
We're deleting a massive number of records in Cassandra. We get the following error. We also get this error when we insert a massive number of records:
Error performing remove on 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
Stack trace:
0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
3 [internal function]: TFramedTransport->read(8192)
4 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(691): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(664): CassandraClient->recv_remove()
6 [internal function]: CassandraClient->remove('CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
7 /home/zonefiles/php/connection.php(230): call_user_func_array(Array, Array)
8 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
9 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
10 {main}
Error connecting to 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
Stack trace:
0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
3 [internal function]: TFramedTransport->read(8192)
4 /home/zonefiles/php/thrift/packages开发者_运维百科/cassandra/Cassandra.php(1015): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(992): CassandraClient->recv_describe_version()
6 /home/zonefiles/php/connection.php(63): CassandraClient->describe_version()
7 /home/zonefiles/php/connection.php(163): ConnectionWrapper->__construct('CDTMain1', '10.130.279.40:9...', NULL, true, 5000, 5000)
8 /home/zonefiles/php/connection.php(254): ConnectionPool->make_conn()
9 /home/zonefiles/php/connection.php(241): ConnectionPool->handle_conn_failure(Object(ConnectionWrapper), 'remove', Object(TTransportException), 1)
10 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
11 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
12 {main}
Here is the PHP we use to generate the error:
<?php
set_time_limit(2000);
require 'connection.php';
require 'columnfamily.php';
$servers[0]['host'] = 'private ip';
$servers[0]['port'] = '9160';
$conn = new Connection('Server11', $servers);
$urlFamily = new ColumnFamily($conn, 'Domain'); // ColumnFamily
$start = microtime(true);
$limit = 100000000;
$rows = $urlFamily->get_range($key_start='', $key_finish='zzzzzzzzzzzzzzz',100000000);
$num = 0;
$delCount = 0;
foreach($rows as $key => $columns) {
// Do stuff with $key or $columns
if (strpos($key, ' .net') !== false) {
//echo 'deleting ' . $key . "\n";
$urlFamily->remove($key);
$delCount++;
}
if ($num++ > 100000000) break;
//$num++;
if ($num % 100000 == 0) echo $num . "\n";
}
$end = microtime(true);
echo $num . " total\n";
echo $delCount . ' deleted in ' . ($end - $start) . " seconds\n";
echo $delCount / ($end - $start) . " deleted per second\n";
?>
We are running PHP 5.3.5 on Fedora 14 Laughlin and Thrift 0.5.0.
One theory is that this is caused by a Cassandra not being able to process the commands fast enough. Do you agree/disagree? Have you seen this before?
If you recommend deleting a different way (e.g. Truncate), how do we still prevent this issue from happening when we do other things with Cassandra?
Are those just log messages, or is an exception actually being raised? phpcassa calls error_log() every time that an exception like this is caught before retrying with a different connection. Basically, this means that you should keep an eye on the stack traces that get logged, but you don't need to worry too much about them.
Those are client-side socket timeouts, which means that the call has taken longer than the default timeout of 5 seconds. Why exactly these are happening in the first place depends a lot on how Cassandra is behaving. Monitoring Cassandra is probably the best place to start.
According to my programmer, we actually fixed this by jacking up the timeouts to a very high value. We were trying to import a 5GB file, so I guess the db needed more than 5 seconds per read.
Here are the specific timeouts that were set:
$send_timeout=60000 $recv_timeout=60000
精彩评论