You can update a DSE Search column to set a column expiration date using CQL. Eventually, this action causes removal of the column from the database.
To set a DSE Search column to expire, add a field named ttl_expire to the schema. Next, update the column using CQL to set the time-to-live (TTL) option. The following section shows you the step-by-step procedure.
To expire a DSE Search column
This procedure builds upon the Wikipedia demo to expire a DSE Search column.
Make the wikipedia demo directory your current directory. Modify the sample schema.xml file of the Wikipedia demo to add the ttl_expire field:
<field name="ttl_expire" type="string" indexed="true" stored="true"/>
Post the schema and Solr configuration file for the Wikpedia demo by rerunning the demo script. On Linux, for example:
Index the articles contained in the wikipedia-sample.bz2 file in the demo directory. For example:
sudo ./2-index.sh --wikifile wikipedia-sample.bz2
Three thousand articles load.
To test expiration of a DSE Search column
On the cqlsh command line, use the wiki keyspace, and then alter the Solr column to set gc_grace_seconds to 0.
USE wiki; ALTER TABLE solr WITH gc_grace_seconds = 0;
By setting gc_grace_seconds to 0, the column will be removed as soon as the TTL seconds expire.
Use the CQL UPDATE command to update, or create if the column doesn't exist, the Solr column. For example, set TTL values on two, non-existent rows.
UPDATE solr USING TTL 10 SET title='testtitle', body='solr body', WHERE KEY='key1'; UPDATE solr USING TTL 3600 SET title='testtitle2', body='solr body', WHERE KEY='key2';
After 10 seconds, query the database to check that the column entitled testtitle was removed from the database, but the column entitled testtitle2 has not yet been removed.
SELECT * FROM solr WHERE solr_query='title:testtitle'; SELECT * FROM solr WHERE solr_query='title:testtitle2';
The first query returns no results after 10 seconds. The second query returns the key2 if an hour (3600 seconds) has not elapsed.
After Cassandra expires a column using the time-to-live (TTL) mechanism, DSE Search/Solr can still find the expired column. The column data remains in the index until one of the following conditions is met:
Re-indexing occurs due to a DSE Search ttl rebuild timeout.
Set the ttl rebuild timeout properties in the dse.yaml file.
All columns in a row expire due to the Cassandra time-to-live (TTL) mechanism, triggering removal of the entire row/Solr document from the index.
Setting the ttl rebuild timeout properties is recommended for managing expired columns.
To force a re-indexing operation, you can periodically poll the column family and re-index Solr when there is any expired column. To poll the column family, you add an expiring time field to the index document, so you can search on that expiring field to re-index the expired documents. You can configure one scheduler per Solr core to search the expired documents periodically and re-index them. The re-index of the Solr secondary index, which is a per-row type of secondary index, actually re-indexes the whole row.
TTL re-indexing does consume resources, such as cpu, memory and read Cassandra column families. You can set the re-indexing frequency to a longer time if there’s not much TTL data in Cassandra column families. Typically, you need check the frequency of your compaction and assign the re-indexing frequency to a value less than compaction frequency but suitable for your business requirement.