<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Cassandra crashes when running a import script for big data</title>
		<link>http://www.datastax.com/support-forums/topic/cassandra-crashes-when-running-a-import-script-for-big-data</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Mon, 20 May 2013 13:44:56 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/cassandra-crashes-when-running-a-import-script-for-big-data" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "Cassandra crashes when running a import script for big data"</title>
			<link>http://www.datastax.com/support-forums/topic/cassandra-crashes-when-running-a-import-script-for-big-data#post-1754</link>
			<pubDate>Tue, 24 Apr 2012 15:40:33 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1754@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Dear All,&#60;/p&#62;
&#60;p&#62;I'm working on an import script that process and import very big data (over 3GB data with million records) into a Cassandra keyspace. The script always crashes after importing around 2,000,000 records with errors below&#60;/p&#62;
&#60;p&#62;^[[0K      - test_application: 1727062&#60;br /&#62;
^[[0K      - test_application: 1727063/usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in `read_all'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in `read_frame'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in `read_into_buffer'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/protocol/binary_protocol.rb:194:in `read_i32'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/protocol/binary_protocol.rb:120:in `read_message_begin'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in `receive_message'&#60;br /&#62;
        from /home/deployer/.bundler/ruby/1.9.1/cassandra-4bca4c8f0dc7/vendor/1.0/gen-rb/cassandra.rb:111:in `recv_multiget_slice'&#60;br /&#62;
        from /home/deployer/.bundler/ruby/1.9.1/cassandra-4bca4c8f0dc7/vendor/1.0/gen-rb/cassandra.rb:103:in `multiget_slice'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in `handled_proxy'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in `multiget_slice'&#60;br /&#62;
        from /home/deployer/.bundler/ruby/1.9.1/cassandra-4bca4c8f0dc7/lib/cassandra/protocol.rb:83:in `_multiget'&#60;br /&#62;
        from /home/deployer/.bundler/ruby/1.9.1/cassandra-4bca4c8f0dc7/lib/cassandra/cassandra.rb:619:in `multi_get'&#60;br /&#62;
        from /usr/local/lib/ruby/gems/1.9.1/gems/active_column-0.2/lib/active_column/base.rb:25:in `find'&#60;/p&#62;
&#60;p&#62;I've already tried some ways below I found from Googling but no luck, please help&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;*. Fix error (this not work)
   thrift/transport/socket.rb:109:in&#60;/code&#62;&#60;/pre&#62;
read': CassandraThrift::Cassandra::Client::TransportException&#60;/p&#62;
&#60;p&#62;   That's a Ruby 1.9 issue.&#60;br /&#62;
   As suggested we convert non ASCII strings to binary before writing them: In ~/.rvm/gems/ruby-1.9.2-p290/gems/thrift-0.7.0/lib/thrift/protocol/binary_protocol.rb, this is a patch suggestion:&#60;/p&#62;
&#60;p&#62;   sudo nano /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/protocol/binary_protocol.rb&#60;/p&#62;
&#60;p&#62;   {code}&#60;br /&#62;
      def write_string(str)&#60;br /&#62;
        if str.encoding.to_s != &#34;US-ASCII&#34;&#60;br /&#62;
          str = str.unpack(&#34;a*&#34;).first&#60;br /&#62;
        end&#60;br /&#62;
        write_i32(str.length)&#60;br /&#62;
        trans.write(str)&#60;br /&#62;
      end&#60;br /&#62;
   {code}     &#60;/p&#62;
&#60;p&#62;*. Concurrent requests crash operations(seem not work)&#60;br /&#62;
   /Users/mseeger/.rvm/gems/ruby-1.9.3-rc1/gems/thrift-0.7.0/lib/thrift/transport/socket.rb:109:in `read': CassandraThrift::Cassandra::Client::TransportException&#60;/p&#62;
&#60;p&#62;   I have two ruby processes: one process bulk inserting data (long running), the other process searching (manually launched during the insert)&#60;/p&#62;
&#60;p&#62;   If I launch a search during the insertion, one of the two jobs will die.   &#60;/p&#62;
&#60;p&#62;   I think it MIGHT have something to do with the way I inserted the data.&#60;br /&#62;
   When I was doing a batch insertion (500 at a time) it did crash. When I am inserting them 1 by 1 (slower), these crashes don't seem to happen. The search operation stayed the same.&#60;/p&#62;
&#60;p&#62;   Ok, running the client with {:retries =&#38;gt; 10, :timeout =&#38;gt; 15, :connect_timeout =&#38;gt; 15} seems to solve the problem. I guess it was just a question of connection defaults. For a single node development &#34;cluster&#34; they seem to be too low.&#60;/p&#62;
&#60;p&#62;*. Update /etc/cassandra/conf/cassandra.yml config&#60;br /&#62;
   # Frame size for thrift (maximum field length).&#60;br /&#62;
   # 0 disables TFramedTransport in favor of TSocket. This option&#60;br /&#62;
   # is deprecated; we strongly recommend using Framed mode.&#60;br /&#62;
   thrift_framed_transport_size_in_mb: 150&#60;/p&#62;
&#60;p&#62;   # The max length of a thrift message, including all fields and&#60;br /&#62;
   # internal thrift overhead.&#60;br /&#62;
   thrift_max_message_length_in_mb: 160&#60;br /&#62;
`
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
