Technology•April 9, 2019

DataStax Bulk Loader Pt. 2 — More Loading

Brian Hess

$ dsbulk load -url /tmp/dsbulkblog/iris_with_comment.csv -k dsbulkblog -t iris_with_id -comment "#"

$ cqlsh -e "TRUNCATE dsbulkblog.iris_with_id;"

$ dsbulk load -url /tmp/dsbulkblog/iris_with_nulls.csv -k dsbulkblog -t iris_with_id

id  | petal_length | petal_width | sepal_length | sepal_width | species

-----+--------------+-------------+--------------+-------------+---------

(3 rows)

$ dsbulk load -url /tmp/dsbulkblog/iris_with_null_string.csv -k dsbulkblog -t iris_with_id

$ cqlsh -e "SELECT * FROM dsbulkblog.iris_with_id LIMIT 3;"

id  | petal_length | petal_width | sepal_length | sepal_width | species

-----+--------------+-------------+--------------+-------------+---------

$ cqlsh -e "TRUNCATE dsbulkblog.iris_with_id;"

$ dsbulk load -url /tmp/dsbulkblog/iris_with_null_string.csv -k dsbulkblog -t iris_with_id -nullStrings "NULL"

$ cqlsh -e "SELECT * FROM dsbulkblog.iris_with_id LIMIT 3;"

id  | petal_length | petal_width | sepal_length | sepal_width | species

-----+--------------+-------------+--------------+-------------+---------

$ dsbulk load -url /tmp/dsbulkblog/president_birthdates.psv -k dsbulkblog -t president_birthdates -delim "|"

$ dsbulk load -url /tmp/logs/LOAD_20190314-164545-280529/mapping.bad -k dsbulkblog -t president_birthdates -delim "|" -header false -m "president,birthdate_string,birthdate" --codec.date "y.M.d"

$ dsbulk load -url /tmp/logs/LOAD_20190314-164545-280529/mapping.bad -k dsbulkblog -t president_birthdates -delim "|" -header false -m `head -1 /tmp/dsbulkblog/president_birthdates.psv | sed 's/|/,/g'` --codec.date "y.M.d"

$ dsbulk load -url /tmp/dsbulkblog/president_birthdates.psv -delim "|" -query "INSERT INTO dsbulkblog.president_birthdates(president,birthdate,birthdate_string) VALUES (:president, CurrentDate(), :birthdate_string)"

$ dsbulk load -url /tmp/dsbulkblog/president_birthdates.psv -k dsbulkblog -t president_birthdates -delim "|" -m "president=president, CurrentDate()=birthdate, birthdate_string=birthdate_string"

"Baseball Teams","[\"Boston Red Sox\", \"Atlanta Braves\", \"San Diego Padres\", \"Washington Senators\"]"

$ dsbulk load -url /tmp/dsbulkblog/sportsteams.csv -k dsbulkblog -t categories_list

$ dsbulk load -url /tmp/dsbulkblog/sportsteams.csv -k dsbulkblog -t categories_set

$ dsbulk load -url /tmp/dsbulkblog/iris.csv -k dsbulkblog -t iris_with_id --schema.queryTtl 3600

$ dsbulk load -url /tmp/dsbulkblog/iris.csv -k dsbulkblog -t iris_with_id --schema.queryTimestamp "2018-01-01T12:34:56+00:00"

$ cat /tmp/dsbulkblog/iris_no_header.csv | awk -F, '{printf("%s,%d\n", $0, $6+10000);}' | dsbulk load -k dsbulkblog -t iris_with_id -m "sepal_length,sepal_width,petal_length,petal_width,species,id,__ttl" -header false

$ cqlsh -e "SELECT id, species, Ttl(species) FROM dsbulkblog.iris_with_id LIMIT 5"

id | species         | ttl(species)

-----+-----------------+--------------

(5 rows)

$ cat /tmp/dsbulkblog/iris_no_header.csv | awk -F, 'BEGIN{printf("sepal_length,sepal_width,petal_length,petal_width,species,id,ttl_to_use\n");} {printf("%s,%d\n", $0, $6+10000);}' | dsbulk load -query "INSERT INTO dsbulkblog.iris_with_id(sepal_length,sepal_width,petal_length,petal_width,species,id) VALUES (:sepal_length,:sepal_width,:petal_length,:petal_width,:species,:id) USING TTL :ttl_to_use"

$ dsbulk load -url /tmp/dsbulkblog/iris.csv -k dsbulkblog -t iris_with_id -dryRun true

total | failed | rows/s | mb/s | kb/row | p50ms | p99ms | p999ms | batches

150 |      0 | 0 | 0.00 |   0.00 | 0.00 | 0.00 |   0.00 | 1.00

$ dsbulk load -url /tmp/dsbulkblog/iris.csv -k dsbulkblog -t iris_with_id --executor.maxPerSecond 10

$ dsbulk load -url /tmp/dsbulkblog/iris.csv -k dsbulkblog -t iris_with_id --executor.maxInFlight 3

Discover more

Data ProcessingDataStax Bulk Loader

More Technology

View All

Technology • April 18, 2024

Knowledge Graphs for RAG without a GraphDB

Technology • April 17, 2024

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Technology • April 16, 2024

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Technology • April 11, 2024

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

DataStax Bulk Loader Pt. 2 — More Loading

Brian Hess

Discover more

Share

Share

Example 5: Comments

Example 6: Nulls

Example 6.1: Null strings

Example 6.2: Unset versus NULL

Example 7: Delimiters

Example 8: Date Formats

Example 9: CurrentDate(), Now(), etc

Example 8.1: Using mappings for CurrentDate(), Now(), etc

Example 10: Collections

Example 11: TTLs and Timestamps

Example 11.1: TTLs

Example 11.2: Timestamps

Example 11.3: TTL from input data

Example 11.4: TTL from input data using a custom query

Example 12: Dry Run

Example 13: Controlling the rate and in-flight requests

Example 13.1: Controlling the rate