# A tibble: 4 × 4
muppet_id names role wage_per_hour
<dbl> <chr> <chr> <dbl>
1 1 Kermit the Frog pragmatic leader 75
2 2 Miss Piggy diva pig 85
3 3 Fozzie the Bear stand-up comic 25.2
4 4 Gonzo stunt performer 54.5
What is an inner join?
An inner join keeps only rows from two tables that have matches. This Venn diagram illustrates which rows are included in an inner join.
How does an inner join work?
An inner join visits every row in the left table and then checks if there is a matching row in the right table. If there is, it combines the fields from the matching left and right row. If there is not a match, it drops the row. Thus only the “intersection” of rows from both tables is kept in the result.
How many records end up in the result of an inner join?
\(L =\) # of records in the left table
\(R =\) # of records in the right table
Then the number of rows in the resulting inner join is \(min(L,R)\)
Notice how the last row in the left table (id=5
, name="Rowlf the Dog
) and the last row in the right table (id=-8
, role="frenzied monster drummer"
, wage_per_hour=45.75
) were dropped.
The Data School provides a nice animation of the row by row operations for an inner join.
Implementing an inner join in different tools
Here is how you perform an inner join in R, Python, Power BI, and Excel.
R
Python
Power BI
In Power BI, we invoke Power Query through the Home >> Transform data menu item.
Excel
In native Excel, there is not a true inner join. The best we can do is mimic an inner join in two steps. First, find the set of intersecting values from the primary keys (ids) from both tables. We can use the MATCH()
function to do this. Second, use that set of intersecting keys to merge in the columns together by using VLOOKUP()
repeatedly.
:::{.callout-note} You can invoke Power Query in Excel (through the Data >> Get Data menu item) to perform a true inner join. :::