DataStax Enterprise 3.0 Documentation

Expiring a DSE Search column

This documentation corresponds to an earlier product version. Make sure this document corresponds to your version.

Latest DSE documentation | Earlier DSE documentation

You can update a DSE Search column to set a column expiration date using CQL. Eventually, this action causes removal of the column from the database.

To set a DSE Search column to expire, add a field named ttl_expire to the schema. Next, update the column using CQL to set the time-to-live (TTL) option. The following section shows you the step-by-step procedure.

To expire a DSE Search column

This procedure builds upon the Wikipedia demo to expire a DSE Search column.

  1. Make the wikipedia demo directory your current directory. Modify the sample schema.xml file of the Wikipedia demo to add the ttl_expire field:

    <field name="ttl_expire" type="string" indexed="true" stored="true"/>
    
  2. Post the schema and Solr configuration file for the Wikpedia demo by rerunning the demo script. On Linux, for example:

    sudo ./1-add-schema.sh
    
  3. Index the articles contained in the wikipedia-sample.bz2 file in the demo directory. For example:

    sudo ./2-index.sh --wikifile wikipedia-sample.bz2
    

    Three thousand articles load.

  4. Start cqlsh.

To test expiration of a DSE Search column

  1. On the cqlsh command line, use the wiki keyspace, and then alter the Solr column to set gc_grace_seconds to 0.

    USE wiki;
    ALTER TABLE solr WITH gc_grace_seconds = 0;
    

    By setting gc_grace_seconds to 0, the column will be removed as soon as the TTL seconds expire.

  2. Use the CQL UPDATE command to update, or create if the column doesn't exist, the Solr column. For example, set TTL values on two, non-existent rows.

    UPDATE solr USING TTL 10
      SET title='testtitle', body='solr body',
      WHERE KEY='key1';
    
    UPDATE solr USING TTL 3600
      SET title='testtitle2', body='solr body',
      WHERE KEY='key2';
    
  3. After 10 seconds, query the database to check that the column entitled testtitle was removed from the database, but the column entitled testtitle2 has not yet been removed.

    SELECT * FROM solr WHERE solr_query='title:testtitle';
    
    SELECT * FROM solr WHERE solr_query='title:testtitle2';
    

    The first query returns no results after 10 seconds. The second query returns the key2 if an hour (3600 seconds) has not elapsed.

Managing expired columns

After Cassandra expires a column using the time-to-live (TTL) mechanism, DSE Search/Solr can still find the expired column. The column data remains in the index until one of the following conditions is met:

Setting the ttl rebuild timeout properties is recommended for managing expired columns.

Forcing re-indexing

To force a re-indexing operation, you can periodically poll the column family and re-index Solr when there is any expired column. To poll the column family, you add an expiring time field to the index document, so you can search on that expiring field to re-index the expired documents. You can configure one scheduler per Solr core to search the expired documents periodically and re-index them. The re-index of the Solr secondary index, which is a per-row type of secondary index, actually re-indexes the whole row.

TTL re-indexing does consume resources, such as cpu, memory and read Cassandra column families. You can set the re-indexing frequency to a longer time if there’s not much TTL data in Cassandra column families. Typically, you need check the frequency of your compaction and assign the re-indexing frequency to a value less than compaction frequency but suitable for your business requirement.

Configuring a scheduler

To configure a scheduler, set the ttl_index configuration parameters in the dse.yaml file.