Lecture 21
Structured Query Language is a special purpose language for interacting with (querying and modifying) indexed tabular data.
ANSI Standard but with dialect divergence (MySql, Postgres, SQLite, etc.)
This functionality maps very closely (but not exactly) with the data manipulation verbs present in dplyr.
SQL is likely to be a foundational skill if you go into industry - learn it and put it on your CV
DuckDB is an open-source column-oriented relational database management system (RDBMS) originally developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands and first released in 2019. The project has over 6 million downloads per month. It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example, SQLite) DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads.
From Wikipedia - DuckDB
DuckDB is a relational database just like SQLite and can be interacted with using DBI and the duckdb package.
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time
<int> <int> <int> <int> <int> <dbl> <int>
1 2013 1 1 517 515 2 830
2 2013 1 1 533 529 4 850
3 2013 1 1 542 540 2 923
4 2013 1 1 544 545 -1 1004
5 2013 1 1 554 600 -6 812
6 2013 1 1 554 558 -4 740
7 2013 1 1 555 600 -5 913
8 2013 1 1 557 600 -3 709
9 2013 1 1 557 600 -3 838
10 2013 1 1 558 600 -2 753
# ℹ 336,766 more rows
# ℹ 12 more variables: sched_arr_time <int>, arr_delay <dbl>,
# carrier <chr>, flight <int>, tailnum <chr>, origin <chr>,
# dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
# minute <dbl>, time_hour <dttm>
# Source: SQL [?? x 3]
# Database: DuckDB 1.5.0 [root@Darwin 25.4.0:R 4.5.3/:memory:]
# Ordered by: desc(n)
origin dest n
<chr> <chr> <dbl>
1 JFK LAX 32
2 LGA ORD 30
3 LGA ATL 29
4 JFK SFO 23
5 LGA CLT 21
6 EWR ORD 18
7 JFK BOS 16
8 EWR SFO 16
9 LGA BOS 16
10 LGA DCA 16
# ℹ more rows
Dot commands are expressions that begins with . and are specific to the DuckDB CLI, some examples include:
A full list of available dot commands can be found here or listed via .help in the CLI.
┌─────────┬───────────────────┬─────────┬────────────┐
│ name │ email │ salary │ dept │
│ varchar │ varchar │ double │ varchar │
├─────────┼───────────────────┼─────────┼────────────┤
│ Alice │ alice@company.com │ 52000.0 │ Accounting │
│ Bob │ bob@company.com │ 40000.0 │ Accounting │
│ Carol │ carol@company.com │ 30000.0 │ Sales │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │
│ Eve │ eve@company.com │ 44000.0 │ Sales │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │
└─────────┴───────────────────┴─────────┴────────────┘
The format of duckdb’s output (in the CLI) is controlled via .mode - the default is duckbox, see possible output formats.
| name | email | salary | dept |
|-------|-------------------|--------:|------------|
| Alice | alice@company.com | 52000.0 | Accounting |
| Bob | bob@company.com | 40000.0 | Accounting |
| Carol | carol@company.com | 30000.0 | Sales |
| Dave | dave@company.com | 33000.0 | Accounting |
| Eve | eve@company.com | 44000.0 | Sales |
| Frank | frank@comany.com | 37000.0 | Sales |
[{"name":"Alice","email":"alice@company.com","salary":52000.0,"dept":"Accounting"},
{"name":"Bob","email":"bob@company.com","salary":40000.0,"dept":"Accounting"},
{"name":"Carol","email":"carol@company.com","salary":30000.0,"dept":"Sales"},
{"name":"Dave","email":"dave@company.com","salary":33000.0,"dept":"Accounting"},
{"name":"Eve","email":"eve@company.com","salary":44000.0,"dept":"Sales"},
{"name":"Frank","email":"frank@comany.com","salary":37000.0,"dept":"Sales"}]
INSERT INTO "table"("name",email,salary,dept) VALUES('Alice','alice@company.com',52000.0,'Accounting');
INSERT INTO "table"("name",email,salary,dept) VALUES('Bob','bob@company.com',40000.0,'Accounting');
INSERT INTO "table"("name",email,salary,dept) VALUES('Carol','carol@company.com',30000.0,'Sales');
INSERT INTO "table"("name",email,salary,dept) VALUES('Dave','dave@company.com',33000.0,'Accounting');
INSERT INTO "table"("name",email,salary,dept) VALUES('Eve','eve@company.com',44000.0,'Sales');
INSERT INTO "table"("name",email,salary,dept) VALUES('Frank','frank@comany.com',37000.0,'Sales');
We can subset and rename columns, sort results, and limit output using SELECT, ORDER BY, and LIMIT. Additionally, DISTINCT can be used to remove duplicate values.
We can filter rows using a WHERE clause
┌─────────┬───────────────────┬─────────┬────────────┐
│ name │ email │ salary │ dept │
│ varchar │ varchar │ double │ varchar │
├─────────┼───────────────────┼─────────┼────────────┤
│ Carol │ carol@company.com │ 30000.0 │ Sales │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │
└─────────┴───────────────────┴─────────┴────────────┘
┌─────────┬───────────────────┬─────────┬─────────┐
│ name │ email │ salary │ dept │
│ varchar │ varchar │ double │ varchar │
├─────────┼───────────────────┼─────────┼─────────┤
│ Carol │ carol@company.com │ 30000.0 │ Sales │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │
└─────────┴───────────────────┴─────────┴─────────┘
GROUP BY groups rows for aggregation; HAVING filters after aggregation (like WHERE but for grouped results).
┌────────────┬───────┬────────────┐
│ dept │ n │ avg_salary │
│ varchar │ int64 │ double │
├────────────┼───────┼────────────┤
│ Accounting │ 3 │ 41666.67 │
│ Sales │ 3 │ 37000.0 │
└────────────┴───────┴────────────┘
┌────────────┬───────┬────────────┐
│ dept │ n │ avg_salary │
│ varchar │ int64 │ double │
├────────────┼───────┼────────────┤
│ Accounting │ 3 │ 41666.67 │
└────────────┴───────┴────────────┘
Using DuckDB calculate the following quantities for employees.duckdb,
The total costs in payroll for this company
The number of employees in each department who earn more than $35,000
DuckDB has a neat trick in that it can treat files as tables (for supported formats), this lets you query them without having to explicitly create a table in the database.
We can also make this explicit by using the read_csv() function, which is useful if we need to use custom options (e.g. specify a different delimiter)
If we wanted to explicitly create a table from the CSV file this is also possible,
─────────── employees ───────────
───────────── main ──────────────
┌────────────────┐┌───────────────┐
│ employees ││ phone │
│ ││ │
│ name varchar ││ name varchar │
│ email varchar ││ phone varchar │
│ salary double ││ │
│ dept varchar ││ 4 rows │
│ │└───────────────┘
│ 6 rows │
└────────────────┘
It is also possible to create a view from a file - this acts like a table but the data is not copied from the file. This means the file must remain accessible and is re-read on every query, but no storage is used within the database.
─────────────────── employees ────────────────────
────────────────────── main ──────────────────────
┌────────────────┐┌───────────────┐┌───────────────┐
│ employees ││ phone ││ phone_view │
│ ││ ││ │
│ name varchar ││ name varchar ││ name varchar │
│ email varchar ││ phone varchar ││ phone varchar │
│ salary double ││ │└───────────────┘
│ dept varchar ││ 4 rows │
│ │└───────────────┘
│ 6 rows │
└────────────────┘
Tables and views can be deleted using DROP
─────────────────── employees ────────────────────
────────────────────── main ──────────────────────
┌────────────────┐┌───────────────┐┌───────────────┐
│ employees ││ phone ││ phone_view │
│ ││ ││ │
│ name varchar ││ name varchar ││ name varchar │
│ email varchar ││ phone varchar ││ phone varchar │
│ salary double ││ │└───────────────┘
│ dept varchar ││ 4 rows │
│ │└───────────────┘
│ 6 rows │
└────────────────┘
DuckDB requires an ON or USING clause for joins - JOIN alone is a syntax error.
┌─────────┬───────────────────┬─────────┬────────────┬──────────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼──────────────┤
│ Bob │ bob@company.com │ 40000.0 │ Accounting │ 919 555-1111 │
│ Carol │ carol@company.com │ 30000.0 │ Sales │ 919 555-2222 │
│ Eve │ eve@company.com │ 44000.0 │ Sales │ 919 555-3333 │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │ 919 555-4444 │
└─────────┴───────────────────┴─────────┴────────────┴──────────────┘
All standard joins are supported: LEFT JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN, SEMI JOIN, ANTI JOIN, etc.
┌─────────┬───────────────────┬─────────┬────────────┬──────────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼──────────────┤
│ Bob │ bob@company.com │ 40000.0 │ Accounting │ 919 555-1111 │
│ Carol │ carol@company.com │ 30000.0 │ Sales │ 919 555-2222 │
│ Eve │ eve@company.com │ 44000.0 │ Sales │ 919 555-3333 │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │ 919 555-4444 │
│ Alice │ alice@company.com │ 52000.0 │ Accounting │ NULL │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │ NULL │
└─────────┴───────────────────┴─────────┴────────────┴──────────────┘
┌─────────┬───────────────────┬─────────┬────────────┬──────────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼──────────────┤
│ Bob │ bob@company.com │ 40000.0 │ Accounting │ 919 555-1111 │
│ Carol │ carol@company.com │ 30000.0 │ Sales │ 919 555-2222 │
│ Eve │ eve@company.com │ 44000.0 │ Sales │ 919 555-3333 │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │ 919 555-4444 │
│ Alice │ alice@company.com │ 52000.0 │ Accounting │ NULL │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │ NULL │
└─────────┴───────────────────┴─────────┴────────────┴──────────────┘
┌─────────┬───────────────────┬─────────┬────────────┬──────────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼──────────────┤
│ Bob │ bob@company.com │ 40000.0 │ Accounting │ 919 555-1111 │
│ Carol │ carol@company.com │ 30000.0 │ Sales │ 919 555-2222 │
│ Eve │ eve@company.com │ 44000.0 │ Sales │ 919 555-3333 │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │ 919 555-4444 │
└─────────┴───────────────────┴─────────┴────────────┴──────────────┘
┌─────────┬───────────────────┬─────────┬────────────┐
│ name │ email │ salary │ dept │
│ varchar │ varchar │ double │ varchar │
├─────────┼───────────────────┼─────────┼────────────┤
│ Alice │ alice@company.com │ 52000.0 │ Accounting │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │
└─────────┴───────────────────┴─────────┴────────────┘
Tables can be nested within tables for the purpose of creating more complex queries,
┌─────────┬───────────────────┬─────────┬────────────┬─────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼─────────┤
│ Alice │ alice@company.com │ 52000.0 │ Accounting │ NULL │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │ NULL │
└─────────┴───────────────────┴─────────┴────────────┴─────────┘
┌─────────┬───────────────────┬─────────┬────────────┬──────────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼──────────────┤
│ Bob │ bob@company.com │ 40000.0 │ Accounting │ 919 555-1111 │
│ Carol │ carol@company.com │ 30000.0 │ Sales │ 919 555-2222 │
│ Eve │ eve@company.com │ 44000.0 │ Sales │ 919 555-3333 │
│ Frank │ frank@comany.com │ 37000.0 │ Sales │ 919 555-4444 │
└─────────┴───────────────────┴─────────┴────────────┴──────────────┘
A CTE (defined with WITH) is a named subquery that can be referenced by name in the main query - useful for breaking complex queries into readable steps, or reusing a subquery multiple times.
┌─────────┬───────────────────┬─────────┬────────────┬─────────┐
│ name │ email │ salary │ dept │ phone │
│ varchar │ varchar │ double │ varchar │ varchar │
├─────────┼───────────────────┼─────────┼────────────┼─────────┤
│ Alice │ alice@company.com │ 52000.0 │ Accounting │ NULL │
│ Dave │ dave@company.com │ 33000.0 │ Accounting │ NULL │
└─────────┴───────────────────┴─────────┴────────────┴─────────┘
What percentage of the total payroll does each department account for?
How much more (or less) than their department’s average salary does each employee earn?
To give us a bit more variety (and data), we have created another DuckDB database flights.duckdb that contains both nycflights13::flights and nycflights13::planes, the latter of which has details on each plane identified by tail number.
To create the database you can run,
Alternative you can use the copy provided in the exercises repo.
flights.duckdbThe database can then be opened from the terminal tab using,
Before we start we will set a couple of configuration options so that our output is readable.
We also include .timer on so that we get timings for our queries.
flights┌───────┬───────┬───────┬───┬──────────┬────────┬────────┬─────────────────────┐
│ year │ month │ day │ … │ distance │ hour │ minute │ time_hour │
│ int32 │ int32 │ int32 │ … │ double │ double │ double │ timestamp │
├───────┼───────┼───────┼───┼──────────┼────────┼────────┼─────────────────────┤
│ 2013 │ 1 │ 1 │ … │ 1400.0 │ 5.0 │ 15.0 │ 2013-01-01 10:00:00 │
│ 2013 │ 1 │ 1 │ … │ 1416.0 │ 5.0 │ 29.0 │ 2013-01-01 10:00:00 │
│ 2013 │ 1 │ 1 │ … │ 1089.0 │ 5.0 │ 40.0 │ 2013-01-01 10:00:00 │
│ 2013 │ 1 │ 1 │ … │ 1576.0 │ 5.0 │ 45.0 │ 2013-01-01 10:00:00 │
│ 2013 │ 1 │ 1 │ … │ 762.0 │ 6.0 │ 0.0 │ 2013-01-01 11:00:00 │
│ 2013 │ 1 │ 1 │ … │ 719.0 │ 5.0 │ 58.0 │ 2013-01-01 10:00:00 │
│ 2013 │ 1 │ 1 │ … │ 1065.0 │ 6.0 │ 0.0 │ 2013-01-01 11:00:00 │
│ 2013 │ 1 │ 1 │ … │ 229.0 │ 6.0 │ 0.0 │ 2013-01-01 11:00:00 │
│ 2013 │ 1 │ 1 │ … │ 944.0 │ 6.0 │ 0.0 │ 2013-01-01 11:00:00 │
│ 2013 │ 1 │ 1 │ … │ 733.0 │ 6.0 │ 0.0 │ 2013-01-01 11:00:00 │
└───────┴───────┴───────┴───┴──────────┴────────┴────────┴─────────────────────┘
10 rows use .last to show entire result 19 columns (7 shown)
Run Time (s): real 0.001 user 0.000672 sys 0.000502
planes┌─────────┬───────┬─────────────────────────┬───┬───────┬───────┬───────────┐
│ tailnum │ year │ type │ … │ seats │ speed │ engine │
│ varchar │ int32 │ varchar │ … │ int32 │ int32 │ varchar │
├─────────┼───────┼─────────────────────────┼───┼───────┼───────┼───────────┤
│ N10156 │ 2004 │ Fixed wing multi engine │ … │ 55 │ NULL │ Turbo-fan │
│ N102UW │ 1998 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N103US │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N104UW │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N10575 │ 2002 │ Fixed wing multi engine │ … │ 55 │ NULL │ Turbo-fan │
│ N105UW │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N107US │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N108UW │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N109UW │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
│ N110UW │ 1999 │ Fixed wing multi engine │ … │ 182 │ NULL │ Turbo-fan │
└─────────┴───────┴─────────────────────────┴───┴───────┴───────┴───────────┘
10 rows use .last to show entire result 9 columns (6 shown)
Run Time (s): real 0.001 user 0.000518 sys 0.000075
──────────────────── flights ─────────────────────
────────────────────── main ──────────────────────
┌──────────────────────────┐┌──────────────────────┐
│ flights ││ planes │
│ ││ │
│ year integer ││ tailnum varchar │
│ month integer ││ year integer │
│ day integer ││ type varchar │
│ dep_time integer ││ manufacturer varchar │
│ sched_dep_time integer ││ model varchar │
│ dep_delay double ││ engines integer │
│ arr_time integer ││ seats integer │
│ sched_arr_time integer ││ speed integer │
│ arr_delay double ││ engine varchar │
│ carrier varchar ││ │
│ flight integer ││ 3322 rows │
│ tailnum varchar │└──────────────────────┘
│ origin varchar │
│ dest varchar │
│ air_time double │
│ distance double │
│ hour double │
│ minute double │
│ time_hour timestamp │
│ │
│ 336776 rows │
└──────────────────────────┘
What is the total number of seats available on all of the planes that flew out of New York in 2013.
Does the following seem correct?
Why or why not?
Join and select:
EXPLAIN
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ UNGROUPED_AGGREGATE │
│ ──────────────────── │
│ Aggregates: sum(#0) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ seats │
│ │
│ ~336,776 rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ──────────────────── │
│ Join Type: LEFT │
│ │
│ Conditions: ├──────────────┐
│ tailnum = tailnum │ │
│ │ │
│ ~336,776 rows │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ SEQ_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ Table: ││ Table: │
│ flights.main.flights ││ flights.main.planes │
│ ││ │
│ Type: Sequential Scan ││ Type: Sequential Scan │
│ Projections: tailnum ││ │
│ ││ Projections: │
│ ││ tailnum │
│ ││ seats │
│ ││ │
│ ~336,776 rows ││ ~3,322 rows │
└───────────────────────────┘└───────────────────────────┘
Run Time (s): real 0.000 user 0.000328 sys 0.000022
EXPLAIN - key operators| Operator | Description |
|---|---|
SEQ_SCAN |
Read every row of a table sequentially |
FILTER |
Apply a WHERE condition to rows |
PROJECTION |
Compute and select output columns (SELECT) |
HASH_JOIN |
Join two tables by building a hash table on the smaller one |
HASH_GROUP_BY |
Group rows and compute aggregates using a hash table (GROUP BY) |
UNGROUPED_AGGREGATE |
Compute a single aggregate over all rows (no GROUP BY) |
CROSS_PRODUCT |
Cartesian product of two inputs (CROSS JOIN) |
ORDER_BY |
Sort rows (ORDER BY) |
TOP_N |
Efficiently returns the top N sorted rows (ORDER BY + LIMIT) |
EXPLAIN ANALYZE┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT sum(seats) FROM flights LEFT JOIN planes USING (tailnum);
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0032s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ │
│ 0 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ UNGROUPED_AGGREGATE │
│ ──────────────────── │
│ Aggregates: sum(#0) │
│ │
│ │
│ │
│ 1 row │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ seats │
│ │
│ │
│ │
│ 336,776 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN │
│ ──────────────────── │
│ Join Type: LEFT │
│ │
│ Conditions: │
│ tailnum = tailnum ├──────────────┐
│ │ │
│ │ │
│ │ │
│ 336,776 rows │ │
│ 0.00s │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ TABLE_SCAN ││ TABLE_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ Table: ││ Table: │
│ flights.main.flights ││ flights.main.planes │
│ ││ │
│ Type: Sequential Scan ││ Type: Sequential Scan │
│ Projections: tailnum ││ │
│ ││ Projections: │
│ ││ tailnum │
│ ││ seats │
│ ││ │
│ 0.00s ││ │
│ ││ │
│ 336,776 rows ││ 3,322 rows │
│ (0.00s) ││ 0.00s │
└───────────────────────────┘└───────────────────────────┘
Run Time (s): real 0.003 user 0.007876 sys 0.000109
For each carrier and departure airport, how much better or worse (as a %) is their average departure delay compared to the overall average?
┌─────────┬───────────────────┬────────────────┐
│ carrier │ carrier_avg_delay │ pct_vs_overall │
│ varchar │ double │ double │
├─────────┼───────────────────┼────────────────┤
│ US │ 3.78 │ -0.701 │
│ HA │ 4.9 │ -0.612 │
│ AS │ 5.8 │ -0.541 │
│ AA │ 8.59 │ -0.321 │
│ DL │ 9.26 │ -0.267 │
│ MQ │ 10.55 │ -0.165 │
│ UA │ 12.11 │ -0.042 │
│ OO │ 12.59 │ -0.004 │
│ VX │ 12.87 │ 0.018 │
│ B6 │ 13.02 │ 0.03 │
│ 9E │ 16.73 │ 0.323 │
│ WN │ 17.71 │ 0.401 │
│ FL │ 18.73 │ 0.482 │
│ YV │ 19.0 │ 0.503 │
│ EV │ 19.96 │ 0.579 │
│ F9 │ 20.22 │ 0.599 │
└─────────┴───────────────────┴────────────────┘
16 rows 3 columns
Run Time (s): real 0.003 user 0.007670 sys 0.002178
EXPLAIN ANALYZEEXPLAIN ANALYZE
WITH overall AS (
SELECT AVG(dep_delay) AS avg_delay FROM flights
)
SELECT carrier,
ROUND(AVG(dep_delay), 2) AS carrier_avg_delay,
ROUND((AVG(dep_delay) - avg_delay)
/ avg_delay, 3)
AS pct_vs_overall
FROM flights CROSS JOIN overall
GROUP BY carrier, avg_delay
ORDER BY pct_vs_overall;┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE WITH overall AS ( SELECT AVG(dep_delay) AS avg_delay FROM flights ) SELECT carrier, ROUND(AVG(dep_delay), 2) AS carrier_avg_delay, ROUND((AVG(dep_delay) - avg_delay) / avg_delay, 3) AS pct_vs_overall FROM flights CROSS JOIN overall GROUP BY carrier, avg_delay ORDER BY pct_vs_overall;
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0029s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ │
│ 0 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│__internal_decompress_strin│
│ g(#0) │
│ #1 │
│ #2 │
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ ORDER_BY │
│ ──────────────────── │
│ round(((avg(flights.main │
│ .flights.dep_delay) - │
│ overall.avg_delay) / │
│ overall.avg_delay), 3) ASC│
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│__internal_compress_string_│
│ uinteger(#0) │
│ #1 │
│ #2 │
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ carrier │
│ carrier_avg_delay │
│ pct_vs_overall │
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│__internal_decompress_strin│
│ g(#0) │
│ #1 │
│ #2 │
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ──────────────────── │
│ Groups: │
│ #0 │
│ #1 │
│ │
│ Aggregates: avg(#2) │
│ │
│ │
│ │
│ 16 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ carrier │
│ avg_delay │
│ dep_delay │
│ │
│ │
│ │
│ 336,776 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│__internal_compress_string_│
│ uinteger(#0) │
│ #1 │
│ #2 │
│ │
│ │
│ │
│ 336,776 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ CROSS_PRODUCT │
│ ──────────────────── │
│ ├──────────────┐
│ 336,776 rows │ │
│ 0.00s │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ TABLE_SCAN ││ UNGROUPED_AGGREGATE │
│ ──────────────────── ││ ──────────────────── │
│ Table: ││ Aggregates: avg(#0) │
│ flights.main.flights ││ │
│ ││ │
│ Type: Sequential Scan ││ │
│ ││ │
│ Projections: ││ 0.00s │
│ carrier ││ │
│ dep_delay ││ │
│ ││ │
│ ││ │
│ ││ │
│ 336,776 rows ││ 1 row │
│ 0.00s ││ (0.00s) │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ──────────────────── │
│ dep_delay │
│ │
│ │
│ │
│ 336,776 rows │
│ 0.00s │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ TABLE_SCAN │
│ ──────────────────── │
│ Table: │
│ flights.main.flights │
│ │
│ Type: Sequential Scan │
│ │
│ Projections: │
│ dep_delay │
│ │
│ │
│ │
│ 336,776 rows │
│ 0.00s │
└───────────────────────────┘
Run Time (s): real 0.003 user 0.009183 sys 0.000870
system.time({
overall_avg = nycflights13::flights |>
summarise(avg_delay = mean(dep_delay, na.rm = TRUE)) |>
pull(avg_delay)
nycflights13::flights |>
group_by(carrier) |>
summarise(carrier_avg_delay = round(mean(dep_delay, na.rm = TRUE), 2), .groups = "drop") |>
mutate(pct_vs_overall = round((carrier_avg_delay - overall_avg) / overall_avg, 3)) |>
arrange(pct_vs_overall)
}) user system elapsed
0.011 0.001 0.012
The NYC TLC Trip Record Data is a large public dataset of taxi and for-hire vehicle trips in New York City, published monthly since 2009.
Each row is one trip and includes:
| Column | Description |
|---|---|
tpep_pickup_datetime / tpep_dropoff_datetime |
Trip start and end times |
trip_distance |
Distance traveled (miles) |
PULocationID / DOLocationID |
Pickup and dropoff taxi zone |
fare_amount, tip_amount, total_amount |
Fare components |
payment_type |
How the fare was paid (card, cash, etc.) |
passenger_count |
Number of passengers |
Recent years are distributed as parquet files (one per month), with each file containing millions of rows.
# A tibble: 73 × 2
path size
<chr> <fs::bytes>
1 taxi_zone_lookup.csv 12.04K
2 yellow_tripdata_2020-01.parquet 89.23M
3 yellow_tripdata_2020-02.parquet 87.87M
4 yellow_tripdata_2020-03.parquet 42.38M
5 yellow_tripdata_2020-04.parquet 4.24M
6 yellow_tripdata_2020-05.parquet 5.94M
7 yellow_tripdata_2020-06.parquet 9.07M
8 yellow_tripdata_2020-07.parquet 12.77M
9 yellow_tripdata_2020-08.parquet 15.83M
10 yellow_tripdata_2020-09.parquet 20.39M
# ℹ 63 more rows
3.38G
Parquet is a binary, columnar file format widely used in data science and analytics.
Row-oriented (CSV)
name, salary, dept
Alice, 52000, Accounting
Bob, 40000, Accounting
Carol, 30000, Sales
Reading one column requires reading every row.
Column-oriented (Parquet)
name: [Alice, Bob, Carol, ...]
salary: [52000, 40000, 30000, ...]
dept: [Accounting, Accounting, Sales, ...]
Reading one column skips all others entirely.
Key advantages over CSV: compressed (much smaller files), typed (no schema inference), fast for analytical queries (only reads needed columns). Widely used with Spark, Arrow, Pandas, and cloud data warehouses.
DuckDB can query Parquet files directly - no explicit import step needed.
Glob patterns let you query multiple files as a single table - useful for partitioned datasets.
Enable timings and create a view over all monthly Parquet files:
┌─────────────────────────────────┐
│ taxi │
│ │
│ VendorID bigint │
│ tpep_pickup_datetime timestamp │
│ tpep_dropoff_datetime timestamp │
│ passenger_count double │
│ trip_distance double │
│ RatecodeID double │
│ store_and_fwd_flag varchar │
│ PULocationID bigint │
│ DOLocationID bigint │
│ payment_type bigint │
│ fare_amount double │
│ extra double │
│ mta_tax double │
│ tip_amount double │
│ tolls_amount double │
│ improvement_surcharge double │
│ total_amount double │
│ congestion_surcharge double │
│ airport_fee integer │
└─────────────────────────────────┘
Run Time (s): real 0.001 user 0.000465 sys 0.000215
──────────── memory ─────────────
───────────── main ──────────────
┌─────────────────────────────────┐
│ taxi │
│ │
│ VendorID bigint │
│ tpep_pickup_datetime timestamp │
│ tpep_dropoff_datetime timestamp │
│ passenger_count double │
│ trip_distance double │
│ RatecodeID double │
│ store_and_fwd_flag varchar │
│ PULocationID bigint │
│ DOLocationID bigint │
│ payment_type bigint │
│ fare_amount double │
│ extra double │
│ mta_tax double │
│ tip_amount double │
│ tolls_amount double │
│ improvement_surcharge double │
│ total_amount double │
│ congestion_surcharge double │
│ airport_fee "null" │
└─────────────────────────────────┘
Create a lookup table for the numeric payment_type codes:
Run Time (s): real 0.000 user 0.000225 sys 0.000063
┌──────────────┬───────────────────┐
│ payment_type │ payment_type_desc │
│ int32 │ varchar │
├──────────────┼───────────────────┤
│ 0 │ Flex Fare trip │
│ 1 │ Credit card │
│ 2 │ Cash │
│ 3 │ No charge │
│ 4 │ Dispute │
│ 5 │ Unknown │
│ 6 │ Voided trip │
└──────────────┴───────────────────┘
Run Time (s): real 0.000 user 0.000139 sys 0.000016
What fraction of the fare is tipped on average, by payment type?
┌───────────────────┬───────────────┬───────────┐
│ payment_type_desc │ mean_tip_frac │ n │
│ varchar │ double │ int64 │
├───────────────────┼───────────────┼───────────┤
│ Flex Fare trip │ 0.0542 │ 18478107 │
│ Credit card │ 0.274 │ 161375392 │
│ Cash │ 0.0 │ 36624561 │
│ No charge │ 0.0012 │ 940580 │
│ Dispute │ 0.0009 │ 1453619 │
│ Unknown │ 0.0 │ 20 │
└───────────────────┴───────────────┴───────────┘
Run Time (s): real 0.843 user 11.130475 sys 0.086085
Average fare per mile for each pickup zone, joined with zone names:
SELECT * FROM (
SELECT
PULocationID AS pickup_zone,
ROUND(AVG(fare_amount), 2) AS fare_amount,
ROUND(AVG(trip_distance), 2) AS trip_distance,
ROUND(AVG(fare_amount / trip_distance), 2) AS fare_per_mile,
COUNT(*) AS num_rides
FROM taxi
WHERE trip_distance > 0 AND fare_amount > 0
GROUP BY PULocationID
) NATURAL LEFT JOIN (
SELECT LocationID AS pickup_zone, *
FROM read_csv('~/Scratch/nyctaxi/taxi_zone_lookup.csv')
)
ORDER BY fare_per_mile DESC;┌─────────────┬─────────────┬───────────────┬───────────────┬───────────┬────────────┬───────────────┬───────────────────────────────────────────────┬──────────────┐
│ pickup_zone │ fare_amount │ trip_distance │ fare_per_mile │ num_rides │ LocationID │ Borough │ Zone │ service_zone │
│ int64 │ double │ double │ double │ int64 │ int64 │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼───────────────┼───────────────┼───────────┼────────────┼───────────────┼───────────────────────────────────────────────┼──────────────┤
│ 1 │ 83.98 │ 4.14 │ 2338.06 │ 7245 │ 1 │ EWR │ Newark Airport │ EWR │
│ 216 │ 47.36 │ 24.97 │ 292.28 │ 74147 │ 216 │ Queens │ South Ozone Park │ Boro Zone │
│ 134 │ 42.17 │ 10.68 │ 240.28 │ 29004 │ 134 │ Queens │ Kew Gardens │ Boro Zone │
│ 197 │ 35.53 │ 38.87 │ 144.19 │ 56805 │ 197 │ Queens │ Richmond Hill │ Boro Zone │
│ 84 │ 78.67 │ 25.75 │ 119.01 │ 163 │ 84 │ Staten Island │ Eltingville/Annadale/Prince's Bay │ Boro Zone │
│ 204 │ 84.66 │ 27.6 │ 101.43 │ 144 │ 204 │ Staten Island │ Rossville/Woodrow │ Boro Zone │
│ 120 │ 34.64 │ 7.18 │ 95.25 │ 1937 │ 120 │ Manhattan │ Highbridge Park │ Boro Zone │
│ 265 │ 45.28 │ 10.03 │ 90.66 │ 283245 │ 265 │ N/A │ Outside of NYC │ N/A │
│ 207 │ 42.14 │ 6.16 │ 89.88 │ 5022 │ 207 │ Queens │ Saint Michaels Cemetery/Woodside │ Boro Zone │
│ 109 │ 56.26 │ 520.98 │ 88.84 │ 204 │ 109 │ Staten Island │ Great Kills │ Boro Zone │
│ 129 │ 27.39 │ 25.52 │ 68.58 │ 98924 │ 129 │ Queens │ Jackson Heights │ Boro Zone │
│ 30 │ 33.85 │ 8.57 │ 65.95 │ 353 │ 30 │ Queens │ Broad Channel │ Boro Zone │
│ 221 │ 38.6 │ 51.89 │ 59.04 │ 1061 │ 221 │ Staten Island │ Stapleton │ Boro Zone │
│ 96 │ 31.03 │ 6.67 │ 58.43 │ 1577 │ 96 │ Queens │ Forest Park/Highland Park │ Boro Zone │
│ 124 │ 37.46 │ 65.86 │ 55.58 │ 13304 │ 124 │ Queens │ Howard Beach │ Boro Zone │
│ 180 │ 31.82 │ 13.44 │ 53.74 │ 10302 │ 180 │ Queens │ Ozone Park │ Boro Zone │
│ 219 │ 53.8 │ 55.74 │ 48.42 │ 31840 │ 219 │ Queens │ Springfield Gardens South │ Boro Zone │
│ 206 │ 51.61 │ 383.92 │ 46.89 │ 594 │ 206 │ Staten Island │ Saint George/New Brighton │ Boro Zone │
│ 83 │ 27.44 │ 47.94 │ 44.52 │ 22221 │ 83 │ Queens │ Elmhurst/Maspeth │ Boro Zone │
│ 145 │ 30.09 │ 11.09 │ 43.2 │ 167963 │ 145 │ Queens │ Long Island City/Hunters Point │ Boro Zone │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ 141 │ 12.15 │ 2.97 │ 10.14 │ 5355630 │ 141 │ Manhattan │ Lenox Hill West │ Yellow Zone │
│ 237 │ 11.68 │ 2.34 │ 10.1 │ 10183946 │ 237 │ Manhattan │ Upper East Side South │ Yellow Zone │
│ 229 │ 12.64 │ 3.23 │ 10.05 │ 4122175 │ 229 │ Manhattan │ Sutton Place/Turtle Bay North │ Yellow Zone │
│ 79 │ 13.94 │ 4.39 │ 10.05 │ 5242428 │ 79 │ Manhattan │ East Village │ Yellow Zone │
│ 249 │ 13.83 │ 3.47 │ 9.94 │ 4640245 │ 249 │ Manhattan │ West Village │ Yellow Zone │
│ 138 │ 39.58 │ 9.73 │ 9.86 │ 5930897 │ 138 │ Queens │ LaGuardia Airport │ Airports │
│ 151 │ 13.36 │ 6.59 │ 9.86 │ 1941932 │ 151 │ Manhattan │ Manhattan Valley │ Yellow Zone │
│ 70 │ 39.13 │ 10.95 │ 9.78 │ 695627 │ 70 │ Queens │ East Elmhurst │ Boro Zone │
│ 236 │ 12.14 │ 2.78 │ 9.78 │ 9173243 │ 236 │ Manhattan │ Upper East Side North │ Yellow Zone │
│ 263 │ 12.45 │ 4.25 │ 9.77 │ 4364630 │ 263 │ Manhattan │ Yorkville West │ Yellow Zone │
│ 238 │ 12.9 │ 4.81 │ 9.69 │ 4434109 │ 238 │ Manhattan │ Upper West Side North │ Yellow Zone │
│ 239 │ 12.83 │ 3.82 │ 9.42 │ 6022092 │ 239 │ Manhattan │ Upper West Side South │ Yellow Zone │
│ 43 │ 13.44 │ 2.93 │ 9.4 │ 3457031 │ 43 │ Manhattan │ Central Park │ Yellow Zone │
│ 46 │ 40.07 │ 11.65 │ 9.36 │ 1743 │ 46 │ Bronx │ City Island │ Boro Zone │
│ 142 │ 12.9 │ 2.9 │ 9.28 │ 6821494 │ 142 │ Manhattan │ Lincoln Square East │ Yellow Zone │
│ 2 │ 51.11 │ 14.33 │ 6.69 │ 215 │ 2 │ Queens │ Jamaica Bay │ Boro Zone │
│ 105 │ 21.48 │ 5.76 │ 4.71 │ 63 │ 105 │ Manhattan │ Governor's Island/Ellis Island/Liberty Island │ Yellow Zone │
│ 199 │ 37.34 │ 9.99 │ 4.24 │ 87 │ 199 │ Bronx │ Rikers Island │ Boro Zone │
│ 176 │ 43.18 │ 12.66 │ 4.16 │ 163 │ 176 │ Staten Island │ Oakwood │ Boro Zone │
│ 110 │ 55.16 │ 16.9 │ 4.02 │ 3 │ 110 │ Staten Island │ Great Kills Park │ Boro Zone │
└─────────────┴─────────────┴───────────────┴───────────────┴───────────┴────────────┴───────────────┴───────────────────────────────────────────────┴──────────────┘
263 rows (40 shown) use .last to show entire result 9 columns
Run Time (s): real 0.526 user 6.934476 sys 0.087483
Using quantile_cont instead of AVG for a more robust estimate:
SELECT * FROM (
SELECT
PULocationID AS pickup_zone,
ROUND(quantile_cont(fare_amount, 0.5), 2) AS fare_amount,
ROUND(quantile_cont(trip_distance, 0.5), 2) AS trip_distance,
ROUND(quantile_cont(fare_amount / trip_distance, 0.5), 2) AS fare_per_mile,
COUNT(*) AS num_rides
FROM taxi
WHERE trip_distance > 0 AND fare_amount > 0
GROUP BY PULocationID
) NATURAL LEFT JOIN (
SELECT LocationID AS pickup_zone, *
FROM read_csv('~/Scratch/nyctaxi/taxi_zone_lookup.csv')
)
ORDER BY fare_per_mile DESC;┌─────────────┬─────────────┬───────────────┬───────────────┬───────────┬────────────┬───────────────┬───────────────────────────────────┬──────────────┐
│ pickup_zone │ fare_amount │ trip_distance │ fare_per_mile │ num_rides │ LocationID │ Borough │ Zone │ service_zone │
│ int64 │ double │ double │ double │ int64 │ int64 │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼───────────────┼───────────────┼───────────┼────────────┼───────────────┼───────────────────────────────────┼──────────────┤
│ 1 │ 90.0 │ 0.1 │ 625.0 │ 7245 │ 1 │ EWR │ Newark Airport │ EWR │
│ 237 │ 9.3 │ 1.3 │ 7.19 │ 10183946 │ 237 │ Manhattan │ Upper East Side South │ Yellow Zone │
│ 161 │ 11.5 │ 1.6 │ 7.13 │ 9136233 │ 161 │ Manhattan │ Midtown Center │ Yellow Zone │
│ 236 │ 10.0 │ 1.49 │ 6.88 │ 9173243 │ 236 │ Manhattan │ Upper East Side North │ Yellow Zone │
│ 234 │ 10.7 │ 1.51 │ 6.88 │ 5801824 │ 234 │ Manhattan │ Union Sq │ Yellow Zone │
│ 163 │ 11.0 │ 1.58 │ 6.87 │ 5887171 │ 163 │ Manhattan │ Midtown North │ Yellow Zone │
│ 164 │ 11.4 │ 1.56 │ 6.86 │ 4698717 │ 164 │ Manhattan │ Midtown South │ Yellow Zone │
│ 186 │ 12.1 │ 1.6 │ 6.85 │ 7208443 │ 186 │ Manhattan │ Penn Station/Madison Sq West │ Yellow Zone │
│ 100 │ 11.0 │ 1.5 │ 6.82 │ 3437441 │ 100 │ Manhattan │ Garment District │ Yellow Zone │
│ 113 │ 10.7 │ 1.59 │ 6.82 │ 2993363 │ 113 │ Manhattan │ Greenwich Village North │ Yellow Zone │
│ 162 │ 11.4 │ 1.6 │ 6.81 │ 7190207 │ 162 │ Manhattan │ Midtown East │ Yellow Zone │
│ 90 │ 10.0 │ 1.5 │ 6.75 │ 3579037 │ 90 │ Manhattan │ Flatiron │ Yellow Zone │
│ 170 │ 11.0 │ 1.59 │ 6.71 │ 6345855 │ 170 │ Manhattan │ Murray Hill │ Yellow Zone │
│ 230 │ 12.1 │ 1.7 │ 6.7 │ 6572024 │ 230 │ Manhattan │ Times Sq/Theatre District │ Yellow Zone │
│ 8 │ 42.9 │ 5.08 │ 6.68 │ 1569 │ 8 │ Queens │ Astoria Park │ Boro Zone │
│ 68 │ 12.5 │ 1.8 │ 6.67 │ 5454880 │ 68 │ Manhattan │ East Chelsea │ Yellow Zone │
│ 43 │ 10.7 │ 1.65 │ 6.58 │ 3457031 │ 43 │ Manhattan │ Central Park │ Yellow Zone │
│ 207 │ 40.8 │ 6.47 │ 6.56 │ 5022 │ 207 │ Queens │ Saint Michaels Cemetery/Woodside │ Boro Zone │
│ 141 │ 9.5 │ 1.44 │ 6.55 │ 5355630 │ 141 │ Manhattan │ Lenox Hill West │ Yellow Zone │
│ 246 │ 12.8 │ 1.88 │ 6.54 │ 3462704 │ 246 │ Manhattan │ West Chelsea/Hudson Yards │ Yellow Zone │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │ · │ · │ · │
│ 109 │ 62.89 │ 14.82 │ 3.56 │ 204 │ 109 │ Staten Island │ Great Kills │ Boro Zone │
│ 46 │ 39.5 │ 11.17 │ 3.53 │ 1743 │ 46 │ Bronx │ City Island │ Boro Zone │
│ 117 │ 46.9 │ 13.3 │ 3.51 │ 19661 │ 117 │ Queens │ Hammels/Arverne │ Boro Zone │
│ 245 │ 47.41 │ 13.79 │ 3.48 │ 253 │ 245 │ Staten Island │ West Brighton │ Boro Zone │
│ 5 │ 67.2 │ 18.99 │ 3.43 │ 900 │ 5 │ Staten Island │ Arden Heights │ Boro Zone │
│ 101 │ 36.12 │ 10.47 │ 3.41 │ 6449 │ 101 │ Queens │ Glen Oaks │ Boro Zone │
│ 86 │ 47.5 │ 14.18 │ 3.4 │ 19320 │ 86 │ Queens │ Far Rockaway │ Boro Zone │
│ 64 │ 35.08 │ 11.07 │ 3.35 │ 5026 │ 64 │ Queens │ Douglaston │ Boro Zone │
│ 115 │ 45.41 │ 13.26 │ 3.32 │ 985 │ 115 │ Staten Island │ Grymes Hill/Clifton │ Boro Zone │
│ 206 │ 55.01 │ 16.85 │ 3.31 │ 594 │ 206 │ Staten Island │ Saint George/New Brighton │ Boro Zone │
│ 214 │ 35.9 │ 10.27 │ 3.31 │ 868 │ 214 │ Staten Island │ South Beach/Dongan Hills │ Boro Zone │
│ 6 │ 3.7 │ 5.79 │ 3.28 │ 2306 │ 6 │ Staten Island │ Arrochar/Fort Wadsworth │ Boro Zone │
│ 118 │ 48.48 │ 15.12 │ 3.23 │ 811 │ 118 │ Staten Island │ Heartland Village/Todt Hill │ Boro Zone │
│ 187 │ 50.35 │ 15.5 │ 3.16 │ 212 │ 187 │ Staten Island │ Port Richmond │ Boro Zone │
│ 156 │ 56.45 │ 17.89 │ 3.05 │ 545 │ 156 │ Staten Island │ Mariners Harbor │ Boro Zone │
│ 84 │ 83.06 │ 27.86 │ 3.0 │ 163 │ 84 │ Staten Island │ Eltingville/Annadale/Prince's Bay │ Boro Zone │
│ 23 │ 63.2 │ 20.5 │ 2.88 │ 3037 │ 23 │ Staten Island │ Bloomfield/Emerson Hill │ Boro Zone │
│ 99 │ 95.5 │ 34.0 │ 2.77 │ 140 │ 99 │ Staten Island │ Freshkills Park │ Boro Zone │
│ 204 │ 88.0 │ 31.12 │ 2.71 │ 144 │ 204 │ Staten Island │ Rossville/Woodrow │ Boro Zone │
│ 44 │ 102.0 │ 38.67 │ 2.68 │ 2792 │ 44 │ Staten Island │ Charleston/Tottenville │ Boro Zone │
└─────────────┴─────────────┴───────────────┴───────────────┴───────────┴────────────┴───────────────┴───────────────────────────────────┴──────────────┘
263 rows (40 shown) use .last to show entire result 9 columns
Run Time (s): real 3.313 user 19.854592 sys 7.931715
Sta 323 - Spring 2026