This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| vd [2021/06/11 12:09] dblume [Process Data] | vd [2023/04/10 10:27] (current) dblume | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== VisiData ====== | ====== VisiData ====== | ||
| - | **[[https:// | + | **[[https:// | 
| - | ====== Tips ====== | + | ====== Source Table Recommendations | 
| - | + | ||
| - | ===== Source Table Recommendations ===== | + | |
| Gather data with more granularity than you think you need. You often need three tables: | Gather data with more granularity than you think you need. You often need three tables: | ||
| Line 13: | Line 11: | ||
| - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.) | - **Mappings from product ID to product name**: (Maybe from platform ID or from product ID.) | ||
| - | ===== Open VisiData and Set Data Types ===== | + | ====== Open VisiData and Set Data Types ====== | 
| $ vd logs.csv active_devices.csv id_name.csv | $ vd logs.csv active_devices.csv id_name.csv | ||
| - | ==== Set Column Types, Widths, Names, Importance ==== | + | ===== Set Column Types, Widths, Names, Importance | 
| ^ Key ^ Meaning ^ | ^ Key ^ Meaning ^ | ||
| Line 25: | Line 23: | ||
| | # | Column is integer. (Set aggregator to sum.) | | | # | Column is integer. (Set aggregator to sum.) | | ||
| | + | Set aggregator, usually to " | | + | Set aggregator, usually to " | ||
| - | | ; | Extract regex to new column. Ex, '' | + | | ; | Extract regex to new column. Ex, '' | 
| | %%^%% | rename the column. Might have to be " | | %%^%% | rename the column. Might have to be " | ||
| + | | = | Use Python function to create new column. Ex, hex to dec: '' | ||
| + | | : | Split column by regex | | ||
| | - | Hide column | | | - | Hide column | | ||
| | S | Go to " | | S | Go to " | ||
| Line 34: | Line 34: | ||
| | d | Delete row. (I only use this on Sheets sheet.) | | | d | Delete row. (I only use this on Sheets sheet.) | | ||
| | , | Select all rows that match this column' | | , | Select all rows that match this column' | ||
| + | | " | Open duplicate sheet with only selected rows | | ||
| - | === Protip: Use column view to set multiple columns at once === | + | ===== Inspecting Columnar Data ===== | 
| + | |||
| + | ^ Key ^ Meaning ^ | ||
| + | | I | Describe all columns, errors, distinct, mode, mean, median, stdev, etc. | | ||
| + | | i | Add a column of incrementing numbers (useful for ' | ||
| + | | . | Requires an " | ||
| + | | O | Options to enable " | ||
| + | | F | Frequency table of row counts, or histogram if numeric_binning is true | | ||
| + | |||
| + | ====== Case Study Link: Exported CSV from PG&E ====== | ||
| + | |||
| + | Visit [[vd-pge]]. | ||
| + | |||
| + | ====== Cast Study: Merging Two Tables, logs and metadata ====== | ||
| + | |||
| + | ==== Protip: Use column view to set multiple columns at once ==== | ||
| ^ Key ^ Meaning ^ | ^ Key ^ Meaning ^ | ||
| Line 52: | Line 68: | ||
| {{: | {{: | ||
| - | ===== Join Sheets ===== | + | ====== Join Sheets | 
| Use '' | Use '' | ||
| ^ Key ^ Meaning ^ | ^ Key ^ Meaning ^ | ||
| + | |   | The key columns to join have to match names. Check and rename with '' | ||
| | S | Go to Sheets sheet | | | S | Go to Sheets sheet | | ||
| | t | (toggle) Select the sheets you want to join by key columns. | | | t | (toggle) Select the sheets you want to join by key columns. | | ||
| | & | (or g&) Join sheets. Type " | | & | (or g&) Join sheets. Type " | ||
| | gu | Next time you're in Sheets, " | | gu | Next time you're in Sheets, " | ||
| + | |||
| + | ==== Protip: Remove insignificant noisy rows ==== | ||
| + | |||
| + | ^ Key ^ Meaning ^ | ||
| + | | %%z|%% | Select rows matching Python expression. Type "blits > 10000" | | ||
| + | | " | Make a new sheet with selected rows | | ||
| And the merge of the logs and active devices sheets would look like: | And the merge of the logs and active devices sheets would look like: | ||
| Line 68: | Line 91: | ||
| Do a similar thing to join the Joined Sheet with the product Name sheet. | Do a similar thing to join the Joined Sheet with the product Name sheet. | ||
| - | ===== Process Data ===== | + | ====== Process Data ====== | 
| In your final Joined sheet, add rows that are calculations of other rows. | In your final Joined sheet, add rows that are calculations of other rows. | ||