Inner Join

Author

Matt Pickard

What is an inner join?

An inner join keeps only rows from two tables that have matches. This Venn diagram illustrates which rows are included in an inner join.

How does an inner join work?

An inner join visits every row in the left table and then checks if there is a matching row in the right table. If there is, it combines the fields from the matching left and right row. If there is not a match, it drops the row. Thus only the “intersection” of rows from both tables is kept in the result.

How many records end up in the result of an inner join?

\(L =\) # of records in the left table

\(R =\) # of records in the right table

Then the number of rows in the resulting inner join is \(min(L,R)\)

Left Table

Right Table

Inner Joined Result

Notice how the last row in the left table (id=5, name="Rowlf the Dog) and the last row in the right table (id=-8, role="frenzied monster drummer", wage_per_hour=45.75) were dropped.

The Data School provides a nice animation of the row by row operations for an inner join.

Implementing an inner join in different tools

Here is how you perform an inner join in R, Python, Power BI, and Excel.

R

library(dplyr)

employees %>% 
  inner_join(positions, by = "muppet_id")
# A tibble: 4 × 4
  muppet_id names           role             wage_per_hour
      <dbl> <chr>           <chr>                    <dbl>
1         1 Kermit the Frog pragmatic leader          75  
2         2 Miss Piggy      diva pig                  85  
3         3 Fozzie the Bear stand-up comic            25.2
4         4 Gonzo           stunt performer           54.5

Python

import pandas as pd

pd.merge(employees, positions, how='inner')
   muppet_id            names              role  wage_per_hour
0        1.0  Kermit the Frog  pragmatic leader          75.00
1        2.0       Miss Piggy          diva pig          85.00
2        3.0  Fozzie the Bear    stand-up comic          25.25
3        4.0            Gonzo   stunt performer          54.50

Power BI

In Power BI, we invoke Power Query through the Home >> Transform data menu item.

Excel

In native Excel, there is not a true inner join. The best we can do is mimic an inner join in two steps. First, find the set of intersecting values from the primary keys (ids) from both tables. We can use the MATCH() function to do this. Second, use that set of intersecting keys to merge in the columns together by using VLOOKUP() repeatedly.

:::{.callout-note} You can invoke Power Query in Excel (through the Data >> Get Data menu item) to perform a true inner join. :::