Auditing is implemented as a log4j-based integration. DataStax Enterprise places the audit log in the directory indicated by a log4j.property. After the file reaches a threshold, it rolls over, and the file name is changed. The file names include a numerical suffix determined by the maxBackupIndex.
The audit logger logs information on the node set up for logging. For example, node 0 has audit turned on, node 1 does not. Issuing updates and other commands on node 1 does not generally show up on node 0’s audit log. To get the maximum information from data auditing, turn on data auditing on every node. The log4j supports data stored on the file system or in Cassandra.
Auditing is configured through a text file in the file system, so the file is vulnerable to OS-level security breaches. Store the file on an OS-level encrypted file system using Gazzang, for example, to secure it.
You can configure which categories of audit events should be logged and also whether operations against any specific keyspaces should be omitted from audit logging.
To configure data auditing:
To configure data auditing, uncomment these properties, and ensure that the default properties are set.
|log4j.logger.DataAudit||INFO, A||Produce INFO-level logs.|
|log4j.additivity.DataAudit||false||Prevents logging to the root appender.|
|log4j.appender.A||org.apache.log4j.RollingFileAppender||Prevents logging to the root appender.|
|log4j.appender.A.File||/var/log/cassandra/audit.log||Sets the file and path of the log file.|
|log4j.appender.A.bufferedIO||true||True improves performance but will not be real time; set to false for testing.|
To disable data auditing, comment out log4j.logger.DataAudit, log4j.additivity.DataAudit, and log4jappender.A. This removes almost all auditing overhead. The Log4J audit logger logs at INFO level, so the DataAudit logger must be configured at INFO (or lower) level in log4j-server.properties. Setting the logger to a higher level, such as WARN, prevents any log events from being recorded, but it does not completely disable the data auditing. Some overhead occurs beyond that caused by regular processing.
Set other general options to tune the logging, for example uncomment these properties and accept the following defaults:
Uncomment and set log4j.appender.A.filter.1.ActiveCategories to ALL or to a combination of these settings:
|ADMIN||Logs describe schema versions, cluster name, version, ring, and other admin events|
|ALL||Logs everything: DDL, DML, queries, and errors|
|AUTH||Logs login events|
|DML||Logs insert, update, delete and other DML events|
|DDL||Logs object and user create, alter, drop, and other DDL events|
|DCL||Logs grant, revoke, create user, drop user, and list users events|
|QUERY||Logs all queries|
Set the ActiveCategories property to a comma separated list of the categories to include in the audit log output. By default, this list is empty so unless specified, no events are included in the log. Events are generated even if not included in the log, so set this property.
You can disable logging for specific keyspaces. Set this property as follows to prevent logging to specified keyspaces:
To prevent the audit logger from logging information about itself when using the Cassandra log4j appender, exempt the keyspace from the appender logs.
The audit log section of the log4j-server.properties file should look something like this:
log4j.logger.DataAudit=INFO, A log4j.additivity.DataAudit=false log4j.appender.A=org.apache.log4j.RollingFileAppender log4j.appender.A.File=/var/log/cassandra/audit.log log4j.appender.A.bufferedIO=true log4j.appender.A.maxFileSize=200MB log4j.appender.A.maxBackupIndex=5 log4j.appender.A.layout=org.apache.log4j.PatternLayout log4j.appender.A.layout.ConversionPattern=%m%n log4j.appender.A.filter.1=com.datastax.bdp.cassandra.audit.AuditLogFilter log4j.appender.A.filter.1.ActiveCategories=ALL log4j.appender.A.filter.1.ExemptKeyspaces=do_not_log,also_do_not_log
The log format is a simple set of pipe-delimited name/value pairs. The pairs themselves are separated by the pipe symbol ("|"), and the name and value portions of each pair are separated by a colon. A name/value pair, or field, is only included in the log line if a value exists for that particular event. Some fields always have a value, and are always present. Others might not be relevant for a given operation. The order in which fields appear (when present) in the log line is predictable to make parsing with automated tools easier. For example, the text of CQL statements is unquoted but if present, is always the last field in the log line.
|Field Label||Field Value||Optional|
|host||dse node address||no|
|timestamp||system time of log event||no|
|category||DML/DDL/QUERY for example||no|
|type||API level operation||no|
The textual description value for the operation field label is currently only present for CQL.
Auditing is completely separate from authorization, although the data points logged include the client address and authenticated user, which may be a generic user if the default authenticator is not overridden. Logging of requests can be activated for any or all of the first list of categories covered by log4j.appender.A.filter.1.ActiveCategories (shown in step 3 in Configuring data auditing).
Generally, SELECT queries are placed into the QUERY category. The INSERT, UPDATE, and DELETE statements are categorized as DML. CQL statements that affect schema, such as CREATE KEYSPACE and DROP KEYSPACE are categorized as DDL.
USE dsp904; host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351003707937|category:DML|type:SET_KS|ks:dsp904|operation:use dsp904;
USE dsp904; host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351004648848|category:DML|type:SET_KS|ks:dsp904
SELECT * FROM t0; host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351003741953|category:QUERY|type:CQL_SELECT|ks:dsp904|cf:t0|operation:select * from t0;
BEGIN BATCH INSERT INTO t0(id, field0) VALUES (0, 'foo') INSERT INTO t0(id, field0) VALUES (1, 'bar') DELETE FROM t1 WHERE id = 2 APPLY BATCH; host:192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005482412|category:DML|type:CQL_UPDATE |batch:fc386364-245a-44c0-a5ab-12f165374a89|ks:dsp904|cf:t0 |operation:INSERT INTO t0 ( id , field0 ) VALUES ( 0 , 'foo' ) host:192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005482413|category:DML|type:CQL_UPDATE |batch:fc386364-245a-44c0-a5ab-12f165374a89|ks:dsp904|cf:t0 |operation:INSERT INTO t0 ( id , field0 ) VALUES ( 1 , 'bar' ) host:192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005482413|category:DML|type:CQL_DELETE |batch:fc386364-245a-44c0-a5ab-12f165374a89|ks:dsp904|cf:t1 |operation:DELETE FROM t1 WHERE id = 2
CQL DROP KEYSPACE
DROP KEYSPACE dsp904; host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351004777354|category:DDL|type:DROP_KS |ks:dsp904|operation:drop keyspace dsp904;
CQL prepared statement
host:/10.112.75.154|source:/127.0.0.1|user:allow_all |timestamp:1356046999323|category:DML|type:CQL_UPDATE |ks:ks|cf:cf|operation:INSERT INTO cf (id, name) VALUES (?, ?) [id=1,name=vic]
host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005073561|category:DML|type:INSERT |batch:7d13a423-4c68-4238-af06-a779697088a9|ks:Keyspace1|cf:Standard1 host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005073562|category:DML|type:INSERT |batch:7d13a423-4c68-4238-af06-a779697088a9|ks:Keyspace1|cf:Standard1 host:/192.168.56.1|source:/192.168.56.101|user:#<User allow_all groups=> |timestamp:1351005073562|category:DML|type:INSERT |batch:7d13a423-4c68-4238-af06-a779697088a9|ks:Keyspace1|cf:Standard1
Batch updates, whether received via a Thrift batch_mutate call, or in CQL BEGIN BATCH....APPLY BATCH block, are logged in the following way: A UUID is generated for the batch, then each individual operation is reported separately, with an extra field containing the batch id.
By default, DSE Search/Solr nodes need no configuration for data auditing except setting up the log4j-server.properties file. If the filter-mapping element in the Solr web.xml file is commented out, the auditor cannot log anything from Solr and you need to configure auditing as described in the next section.
If necessary, uncomment the filter-mapping element in the Solr web.xml.
<filter-mapping> <filter-name>DseAuditLoggingFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>
The Solr web.xml is located in the following directory:
Here is an example of the data audit log of a Solr query: