| The Cassandra 1.2 documentation is transitioning to a new format! Please use the new Cassandra 1.2 documentation instead. | Back to Table of Contents All Documents List |
The sstable2json utility converts the on-disk SSTable representation of a table into a JSON formatted document. Its counterpart, json2sstable, does exactly the opposite: it converts a JSON representation of a table to a Cassandra usable SSTable format. Converting SSTables this way is useful for testing and debugging.
Note
Starting with version 0.7, json2sstable and sstable2json must be run so that the schema can be loaded from system tables. This means that the cassandra.yaml file must be in the classpath and refer to valid storage directories. For more information, see the Import/Export section of http://wiki.apache.org/cassandra/Operations.
This converts the on-disk SSTable representation of a table into a JSON formatted document.
bin/sstable2json SSTABLE
[-k KEY [-k KEY [...]]]] [-x KEY [-x KEY [...]]] [-e]
SSTABLE should be a full path to a {table-name}-Data.db file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.
-k allows you to include a specific set of keys. The KEY must be in HEX format. Limited to 500 keys.
-x allows you to exclude a specific set of keys. Limited to 500 keys.
-e causes keys to only be enumerated.
The output of sstable2json for tables is:
{
ROW_KEY:
{
[
[COLUMN_NAME, COLUMN_VALUE, COLUMN_TIMESTAMP, IS_MARKED_FOR_DELETE],
[COLUMN_NAME, ... ],
...
]
},
ROW_KEY:
{
...
},
...
}
Row keys, column names and values are written in as the HEX representation of their byte arrays. Line breaks are only in between row keys in the actual output.
The output of the sstable2json command reveals the life cycle of Cassandra data. In this procedure, you use the sstable2json to view data in a row that is not scheduled to expire, data that has been evicted and marked with a tombstone, and a row that has had data removed from it.
Create the playlists table in the music keyspace.
Insert the row of data about ZZ Top in playlists:
cqlsh> INSERT INTO music.playlists (id, song_order, song_id, title, artist, album)
VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
1,
a3e64f8f-bd44-4f28-b8d9-6938726e34d4,
'La Grange',
'ZZ Top',
'Tres Hombres'
);
Flush the data to disk. For example:
sudo ./nodetool flush music playlists
You need to have access permission to the data directories to flush data to disk.
Look at the json representation of the SSTable data, for example:
sudo ./sstable2json
/var/lib/cassandra/data/music/playlists/music-playlists-ib-1-Data.db
Output is:
[
{"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","",1370179611971000], [
"1:album","Tres Hombres",1370179611971000], [
"1:artist","ZZ Top",1370179611971000], [
"1:song_id","a3e64f8f-bd44-4f28-b8d9-6938726e34d4",1370179611971000], ["1:title","La Grange",1370179611971000]]}
]
Specify the time-to-live (TTL) for the ZZ Top row, for example 300 seconds.
cqlsh> INSERT INTO music.playlists
(id, song_order, song_id, title, artist, album)
VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
1,
a3e64f8f-bd44-4f28-b8d9-6938726e34d4,
'La Grange',
'ZZ Top',
'Tres Hombres')
USING TTL 300;
After inserting the TTL property on the row to expire the data, Cassandra marks the row with tombstones. You need to list all columns when re-inserting data if you want Cassandra to remove the entire row.
Flush the data to disk again. Do this while the data is evicted, but before the time-to-live elapses and data is removed.
Run the sstable2json command again.
sudo ./sstable2json
/var/lib/cassandra/data/music/playlists/music-playlists-ib-2-Data.db
The tombstone markers--"e" followed by the TTL value, 300--are visible in the json representation of the data.
[
{"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","",1370179816450000,"e",300,1370180116], [
"1:album","Tres Hombres",1370179816450000,"e",300,1370180116], ["1:artist","ZZ Top",1370179816450000,"e",300,1370180116], ["1:song_id","a3e64f8f-bd44-4f28-b8d9-6938726e34d4",1370179816450000,"e",300,1370180116], [
"1:title","La Grange",1370179816450000,"e",300,1370180116]]}
]
After the TTL elapses, flush the data to disk again.
Run the sstable2json command again.
The json representation of the column data shows that the tombstones and data values for the ZZ Top row have been deleted from the SSTable. The values are now marked with "d":
sudo ./sstable2json
/var/lib/cassandra/data/music/playlists/music-playlists-ib-2-Data.db
Output is:
[
{"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","51ab4a14",1370179816450000,"d"], ["1:album","51ab4a14",1370179816450000,"d"], ["1:artist","51ab4a14",1370179816450000,"d"], ["1:song_id","51ab4a14",1370179816450000,"d"], ["1:title","51ab4a14",1370179816450000,"d"]]}
]
You can use the sstable2json command to get information about a counter column.
Run the counter example that loads data into a counter column and flushes data to disk.
The counter is initialized to 1.
Run the sstable2json command.
sudo ./sstable2json
/var/lib/cassandra/data/counterks/page_view_counts
/counterks-page_view_counts-ib-1-Data.db
[
{"key": "7777772e64617461737461782e636f6d","columns": [["home:","",1370187164256000], ["home:counter_value","0001000058852cd0cb9311e2940971f75c7d064100000000000000010000000000000001",1370187164256,"c",-9223372036854775808]]}
]
Increase the counter column by 2 and flush the data to disk again.
Run the sstable2json command again.
The output of sstable2json shows the json representation of the counter value:
sudo ./sstable2json
/var/lib/cassandra/data/counterks/page_view_counts
/counterks-page_view_counts-ib-2-Data.db
[
{"key": "7777772e64617461737461782e636f6d","columns": [[
"home:","",1370187315683000], [
"home:counter_value","0001000058852cd0cb9311e2940971f75c7d064100000000000000010000000000000002",
1370187315683,"c",-9223372036854775808]]}
]
-9223372036854775808 is the timestamp of the last delete.
This converts a JSON representation of a table (aka column family) to a Cassandra usable SSTable format.
bin/json2sstable -K KEYSPACE -c COLUMN_FAMILY JSON SSTABLE
JSON should be a path to the JSON file
SSTABLE should be a full path to a {table-name}-Data.d` file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.
The sstablekeys utility is shorthand for sstable2json with the -e option. Instead of dumping all of a table’s data, it dumps only the keys.
bin/sstablekeys SSTABLE
SSTABLE should be a full path to a {table-name}-Data.db file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.