This is an old revision of the document!
VisiData is an interactive CLI table explorer with vim key bindings. Here is a really great tutorial. These are notes to myself.
Gather data with more granularity than you think you need. You often need three tables:
$ vd logs.csv active_devices.csv id_name.csv
| Key | Meaning |
|---|---|
| _ | Adjust widths of columns |
| @ | Column type is date. |
| ! | Column is important. (Will be key for table merging.) |
| # | Column is integer. (Set aggregator to sum.) |
| + | Set aggregator, usually to “sum” |
| ; | Extract regex to new column. Ex, (video|audio), (^..) or (^([STUVW]...|..)) |
| ^ | rename the column. Might have to be “product_id” or “platform_id” |
| - | Hide column |
| S | Go to “Sheets” sheet, to select another sheet to format. |
| q | Leave the sheet. (gq quit all sheets) |
| F | Make a Frequency Table for the selected column. (gF for selected columns) |
| U | Undo. R Redo. |
| d | Delete row. (I only use this on Sheets sheet.) |
| , | Select all rows that match this column's value |
| “ | Open duplicate sheet with only selected rows |
PG&E CSVs come with 5 rows of metadata followed by Type, Date, Start Time, End Time, Usage, Units, Cost, Notes columns. Delete the five rows of metadata in a text editor, or use tail to remove them like so:
tail +6 pge_electric_interval_data.csv | vd -f csv -
Then prepare your PG&E data like so:
| Key | Meaning |
|---|---|
| - | Hide columns TYPE, END TIME, UNITS and NOTES |
| C | Go to column mode and… |
| t | Select the DATE and START TIME columns |
| & | Make a new column that merges them |
| q, - | Quit the Column mode, hide DATE and START TIME columns |
| O | Go to options mode and… |
| e | Set disp_date_fmt to %Y-%m-%d %H:%M |
| q | Quit options mode. |
| @!, %, $ | Set DATE_START_TIME to date format and important, USAGE to float, COST to currency |
| = | Add a column, enter COST/USAGE, (Make it float with %) |
| ^ | Rename COST/USAGE to kWh rate |
| . or g. | Select columns to graph them. Notice rate changes. Notice times of high use. |
| +, - | Navigate with hjkl, zoom in and out (more) |
| Key | Meaning |
|---|---|
| C | Go to column view, each row is data for a column. |
| t | Select rows, select all integer rows. |
| ge | Go to “type” column, enter ge (global edit), type “int” |
Go to “aggregators”, enter ge, type “sum” |
|
| q | Leave that table. |
For example, your “logs.csv” would end up looking like:
Your active_devices.csv would look like this:
Use ! to set Key columns. Use F (or gF) to remove dupes of Keys by making a Frequency Table.
| Key | Meaning |
|---|---|
The key columns to join have to match names. Check and rename with ^. |
|
| S | Go to Sheets sheet |
| t | (toggle) Select the sheets you want to join by key columns. |
| & | (or g&) Join sheets. Type “inner” for an inner join. |
| gu | Next time you're in Sheets, “global unselect” to unselect the sheets you selected. |
| Key | Meaning |
|---|---|
| z| | Select rows matching Python expression. Type “blits > 10000” |
| ” | Make a new sheet with selected rows |
And the merge of the logs and active devices sheets would look like:
Do a similar thing to join the Joined Sheet with the product Name sheet.
In your final Joined sheet, add rows that are calculations of other rows.
| Key | Meaning |
|---|---|
| HL | (JK for rows) Move columns |
| - | Remove columns |
Maybe use C to assure desired columns are ints and aggregators are “sum” |
|
| ! | Toggle key columns. |
| gF | global Frequency Table so key columns are unique. |
| = | Add column. Enter count_sum * 100 / active_devices_sum |
| % | Set new column as float. |
| _ | Resize columns |
| ] | Sort by column |
| . | (g. for multiple columns) Make a graph based on Key column. |
Now you should have some basic info to keep exploring. When you want to save what you've done:
| Key | Meaning |
|---|---|
| Ctrl+s | Save sheet |
| Ctrl+d | Save command log (cmdlog). Use extension “.vd”. You can replay what you've done later! |
The commandlog is a CSV file too. The first lines are the open file commands. So to replay, you only have to specify the command log like so:
$ vd --play=my_cmdlog.vd --replay-wait=0.5