Structural Pattern Matching In Python

Structural Pattern Matching In Python

Introduction

PEP 634 introduced structural pattern matching to Python. Pattern matching involves providing a pattern and an associated action to be taken if the data fits the pattern. At its simplest, pattern matching works like the switch statement in C/ C++/ JavaScript or Java. Matching a subject value against one or more cases. If you don't know how they work, check out this guide. The main differences however, is that in Python we can also deconstruct/ unpack a pattern into its constituent parts and we can also return a value.

Some of the languages that have implemented pattern matching include Haskell, Scala, Rust and Erlang. TC39 is also considering a proposal to add pattern matching to JavaScript.

Tip
Learn how to create a CI/CD pipeline in Buddy, that will build, test and deploy your Python application on a single push to a branch.

Requirements

  • Python 3.10. If you don't have it, you can get it here

Syntax

At a high level, the syntax looks as follows:

python
match expression: case pattern1: # do something case pattern2: # do something else

expression can be a value or any valid python expression. pattern1 and patttern2 could be any or a combination of the patterns below:

  • Literal patterns
  • Wildcard patterns
  • Sequence Patterns
  • Mapping Patterns
  • Class Patterns
  • Capture patterns
  • OR Patterns
  • AS Patterns

We will explore each of this in detail shortly. For every pattern, we will show examples of how you can use them in code.

Patterns

Literal patterns

Literal patterns match numbers (int, float, complex), strings, boolean (True, False) and None

Consider the example below where we return the name of the day of the week:

python
def weekday_name(weekday_num): if weekday_num == 1: return "Monday" elif weekday_num == 2: return "Tuesday" elif weekday_num == 3: return "Wednesday" elif weekday_num == 4: return "Thursday" elif weekday_num == 5: return "Friday" elif weekday_num == 6: return "Saturday" elif weekday_num == 7: return "Sunday" return "Invalid. Day number should be an integer between 1-7"

We can refactor the code above using a match statement. In this case our patterns would be integers 1 through to 7.

python
def weekday_name(weekday_num): match weekday_num: case 1: return "Monday" case 2: return "Tuesday" case 3: return "Wednesday" case 4: return "Thursday" case 5: return "Friday" case 6: return "Saturday" case 7: return "Sunday"

Open the interpreter and run the code above. Neat, right? Let us look at other patterns.

Wildcard patterns

Sometimes we want to specify a default action if no pattern was matched. In that case, a wildcard pattern is just what we need. The wildcard pattern is written as case _:

Let us see it in action below: Let us improve our weekday_name function above to handle invalid inputs.

python
def weekday_name(weekday_num): match weekday_num: case 1: return "Monday" case 2: return "Tuesday" case 3: return "Wednesday" case 4: return "Thursday" case 5: return "Friday" case 6: return "Saturday" case 7: return "Sunday" case _: return "Invalid. weekday_num should be an integer between 1 and 7"

_ is a wildcard pattern and matches anything that was not matched by the other case statements. In this case, the wildcard pattern will match any input that is not an integer between 1 and 7.

Sequence patterns

Sequence patterns match iterables which are instances of collections.abc.Sequence. This includes lists and tuples.

Let's say we have a sorted list of names from a recently concluded election and we want to write a function that returns the top three candidates. However, we do not know the number of candidates beforehand. This can be easily achieved with the match statement

python
def rank_candidates(candidates): match candidates: case [first]: # when num of candidates is 1 return {"first": first} case [first, second]: # when num of candidates is 2 return {"first": first, "second": second} case [first, second, third]: # when num of candidates is 3 return {"first": first, "second": second, "third": third} case [first, second, third, *rest]: # when num of candidates is >3 return {"first": first, "second": second, "third": third}

Notice the syntax looks very similar to iterable unpacking. A sequence pattern can either be a fixed or variable length pattern. Fixed length patterns know the length of the sequence they are matching. Variable length patterns have a * to denote an arbitrary length. Consider our example above:

  • [first]: is a fixed length pattern. It will match a list containing one name.
  • [first, second]: is a fixed length pattern. It will match a list containing two names.
  • [first, second, third]: is a fixed length pattern. It will match a list containing three names.
  • [first, second, third, *rest]: is a variable length pattern. This will match a list containing more than three names, however long.

A sequence pattern may have at most one *.

Mapping patterns

Mapping patterns match mappings which are instances of collections.abc.Mapping. This includes the python dict. Mapping patterns match mappings using their keys.

Say we want to get the names of all repositories owned by the python organization on Github. We can obtain a list of the repositories from the endpoint api.github.com/orgs/python/repos. We will use pattern matching to print the names of repositories that use the python language.

Install the requests library.

bash
pip install requests$

Before Python 3.10, we could only implement the above using an if statement as shown below:

python
import requests response = requests.get("https://api.github.com/orgs/python/repos") repositories = response.json() for repo in repositories: language = repo.get("language) name = repo.get("name") if language == "Python": print(name)

Using Python 3.10, we can rewrite the above for loop to use a match statement instead.

python
for repo in repositories: match repo: case {"language": "Python", "name": name}: print(name)

{"language": "Python", "name": name} is a mapping pattern that says:

  • The dictionary representing a repository should have the keys "language" and "name".
  • The value of the "language" key should be equal to "Python".
  • Store the value of name in a variable called name.
  • Ignore all the other keys. Any key that is not included in the pattern will be ignored while matching.

With mapping patterns, we do not have to write code that validates the dictionary before extracting the values we want from the dict since we can specify the pattern we want.

Class patterns

A common use case in Python is checking if one class is the subclass of another before performing an operation. Python's pattern matching can also match classes, allowing us to check an object's type. Any class can be matched, even the built-in classes. Class patterns fulfill two purposes: checking whether a given subject is indeed an instance of a specific class, and extracting data from specific attributes of the subject.

Consider the function below that sums two integers. We only want to perform the addition if both numbers are of type int.

python
def sum_two_integers(num1, num2): if isinstance(num1, int) and isinstance(num2, int): return num1 + num2

We can remove the isinstance checks by using the match statement

python
def sum_two_integers(num1, num2): match num1, num2: case int(num1), int(num2): return num1 + num2

The pattern int(num1), int(num2) is a class pattern that checks whether num1 and num2 are both of type int.

Capture patterns

Capture patterns help us to "capture" value(s) from the subject. A capture pattern provides a name that is used as the name of the variable that will be used to store the value of the subject.

Consider the code below:

python
name = "Tarzan Mbogi Genje" match name.split(): case first_name, second_name, third_name: print(f"First name: {first_name}" )

The code above will print "First name: Tarzan"

name.split() returns a list, ['Tarzan', 'Mbogi', 'Genje']. first_name, second_name, third_name is a capture pattern that captures the elements of the array and stores them in the variables first_name, second_name and third_name

OR patterns

OR patterns consist of two or more patterns separated by vertical bars. For the pattern to succeed, one of the patterns must match. If both patterns do not match, the match fails.

Consider the example below that checks whether a value is a string or a number.

python
def check_type(val): match val: case str(val): return "String" case int(val) | float(val): return "Number"

int(val) | float(val) is an OR pattern that consists of two other patterns. The first pattern matches instances of int and the second pattern matches instances of float. If any of them is matched, the match succeeds and we return "Number".

In an OR pattern, all sub patterns must bind to the same variables. In our example above, the sub patterns bind to val.

AS patterns

AS patterns allow us to specify a structure constraint and bind to a value at the same time.

In the check_type function that we used to demonstrate OR patterns, notice we repeated the variable val. We can avoid this by rewriting our pattern as an AS pattern:

python
def check_type(val): match val: case str() as val: return "String" case int() | float() as val: return "Number"

In the above case, we store the subject in a variable called val. AS patterns are excellent for binding values in an OR pattern.

Guards: Adding conditions to patterns

So far, we have seen that pattern matching works by imposing structural constraints on a subject and binding the subject to a value. However, we may still need to perform more checks/ filtering using boolean expressions. A pattern that has a boolean expression is said to have a guard.

Let's say we wanted to print the names of all repositories under the Python Organization that have more than 1000 stars. From the repository data returned by the Github API, a repository has a key, stargazers_count, which stores the number of people who have starred that repository.

We could therefore write our code as follows:

python
import requests response = requests.get("https://api.github.com/orgs/python/repos") repositories = response.json() for repo in repositories: match repo: case {"name": name, "stargazers_count": stars} if stars > 1000: print(name)

{"language": "Python", "name": name, "stargazers_count": stars} if stars > 1000 is a mapping pattern that has a guard. Only the names of repositories that have a stargazers_count of more than 1000 will be printed.

Conclusion

We have learnt about pattern matching in Python and seen some of the powerful patterns supported by Python. If you want to learn more about pattern matching, you can read the official spec; PEP 634, and motivation and rationale; PEP 635

Read similar articles