Apache Cassandra 1.2 Documentation

The sstable2json / json2sstable utility

The Cassandra 1.2 documentation is transitioning to a new format!
Please use the new Cassandra 1.2 documentation instead.
Back to Table of Contents
All Documents List     

The sstable2json utility converts the on-disk SSTable representation of a table into a JSON formatted document. Its counterpart, json2sstable, does exactly the opposite: it converts a JSON representation of a table to a Cassandra usable SSTable format. Converting SSTables this way is useful for testing and debugging.

Note

Starting with version 0.7, json2sstable and sstable2json must be run so that the schema can be loaded from system tables. This means that the cassandra.yaml file must be in the classpath and refer to valid storage directories. For more information, see the Import/Export section of http://wiki.apache.org/cassandra/Operations.

sstable2json

This converts the on-disk SSTable representation of a table into a JSON formatted document.

Usage

bin/sstable2json SSTABLE
   [-k KEY [-k KEY [...]]]] [-x KEY [-x KEY [...]]] [-e]

SSTABLE should be a full path to a {table-name}-Data.db file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.

-k allows you to include a specific set of keys. The KEY must be in HEX format. Limited to 500 keys.

-x allows you to exclude a specific set of keys. Limited to 500 keys.

-e causes keys to only be enumerated.

Output format

The output of sstable2json for tables is:

{
   ROW_KEY:
   {
     [
       [COLUMN_NAME, COLUMN_VALUE, COLUMN_TIMESTAMP, IS_MARKED_FOR_DELETE],
       [COLUMN_NAME, ... ],
       ...
     ]
   },
   ROW_KEY:
   {
     ...
   },
   ...
}

Row keys, column names and values are written in as the HEX representation of their byte arrays. Line breaks are only in between row keys in the actual output.

Tracking data expiration

The output of the sstable2json command reveals the life cycle of Cassandra data. In this procedure, you use the sstable2json to view data in a row that is not scheduled to expire, data that has been evicted and marked with a tombstone, and a row that has had data removed from it.

  1. Create the playlists table in the music keyspace.

  2. Insert the row of data about ZZ Top in playlists:

    cqlsh> INSERT INTO music.playlists (id, song_order, song_id, title, artist, album)
             VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
             1,
             a3e64f8f-bd44-4f28-b8d9-6938726e34d4,
             'La Grange',
             'ZZ Top',
             'Tres Hombres'
           );
    
  3. Flush the data to disk. For example:

    sudo ./nodetool flush music playlists
    

    You need to have access permission to the data directories to flush data to disk.

  4. Look at the json representation of the SSTable data, for example:

    sudo ./sstable2json
      /var/lib/cassandra/data/music/playlists/music-playlists-ib-1-Data.db
    

    Output is:

    [
    {"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","",1370179611971000], [
    "1:album","Tres Hombres",1370179611971000], [
    "1:artist","ZZ Top",1370179611971000], [
    "1:song_id","a3e64f8f-bd44-4f28-b8d9-6938726e34d4",1370179611971000], ["1:title","La Grange",1370179611971000]]}
    ]
    
  5. Specify the time-to-live (TTL) for the ZZ Top row, for example 300 seconds.

    cqlsh> INSERT INTO music.playlists
            (id, song_order, song_id, title, artist, album)
            VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
            1,
            a3e64f8f-bd44-4f28-b8d9-6938726e34d4,
            'La Grange',
            'ZZ Top',
            'Tres Hombres')
           USING TTL 300;
    

    After inserting the TTL property on the row to expire the data, Cassandra marks the row with tombstones. You need to list all columns when re-inserting data if you want Cassandra to remove the entire row.

  6. Flush the data to disk again. Do this while the data is evicted, but before the time-to-live elapses and data is removed.

  7. Run the sstable2json command again.

    sudo ./sstable2json
      /var/lib/cassandra/data/music/playlists/music-playlists-ib-2-Data.db
    

    The tombstone markers--"e" followed by the TTL value, 300--are visible in the json representation of the data.

    [
    {"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","",1370179816450000,"e",300,1370180116], [
    "1:album","Tres Hombres",1370179816450000,"e",300,1370180116], ["1:artist","ZZ Top",1370179816450000,"e",300,1370180116], ["1:song_id","a3e64f8f-bd44-4f28-b8d9-6938726e34d4",1370179816450000,"e",300,1370180116], [
    "1:title","La Grange",1370179816450000,"e",300,1370180116]]}
    ]
    
  8. After the TTL elapses, flush the data to disk again.

  9. Run the sstable2json command again.

    The json representation of the column data shows that the tombstones and data values for the ZZ Top row have been deleted from the SSTable. The values are now marked with "d":

    sudo ./sstable2json
      /var/lib/cassandra/data/music/playlists/music-playlists-ib-2-Data.db
    

    Output is:

    [
    {"key": "62c3609282a13a0093d146196ee77204","columns": [["1:","51ab4a14",1370179816450000,"d"], ["1:album","51ab4a14",1370179816450000,"d"], ["1:artist","51ab4a14",1370179816450000,"d"], ["1:song_id","51ab4a14",1370179816450000,"d"], ["1:title","51ab4a14",1370179816450000,"d"]]}
    ]
    

Tracking counter columns

You can use the sstable2json command to get information about a counter column.

  1. Run the counter example that loads data into a counter column and flushes data to disk.

    The counter is initialized to 1.

  2. Run the sstable2json command.

     sudo ./sstable2json
       /var/lib/cassandra/data/counterks/page_view_counts
       /counterks-page_view_counts-ib-1-Data.db
    
    [
    {"key": "7777772e64617461737461782e636f6d","columns": [["home:","",1370187164256000], ["home:counter_value","0001000058852cd0cb9311e2940971f75c7d064100000000000000010000000000000001",1370187164256,"c",-9223372036854775808]]}
    ]
    
  3. Increase the counter column by 2 and flush the data to disk again.

  4. Run the sstable2json command again.

    The output of sstable2json shows the json representation of the counter value:

    sudo ./sstable2json
      /var/lib/cassandra/data/counterks/page_view_counts
      /counterks-page_view_counts-ib-2-Data.db
    
    [
    {"key": "7777772e64617461737461782e636f6d","columns": [[
    "home:","",1370187315683000], [
    "home:counter_value","0001000058852cd0cb9311e2940971f75c7d064100000000000000010000000000000002",
    1370187315683,"c",-9223372036854775808]]}
    ]
    

    -9223372036854775808 is the timestamp of the last delete.

json2sstable

This converts a JSON representation of a table (aka column family) to a Cassandra usable SSTable format.

Usage

bin/json2sstable -K KEYSPACE -c COLUMN_FAMILY JSON SSTABLE

JSON should be a path to the JSON file

SSTABLE should be a full path to a {table-name}-Data.d` file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.

sstablekeys

The sstablekeys utility is shorthand for sstable2json with the -e option. Instead of dumping all of a table’s data, it dumps only the keys.

Usage

bin/sstablekeys SSTABLE

SSTABLE should be a full path to a {table-name}-Data.db file in Cassandra’s data directory. For example, /var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.