Structural Pattern Matching In Python
Introduction
PEP 634 introduced structural pattern matching to Python. Pattern matching involves providing a pattern and an associated action to be taken if the data fits the pattern. At its simplest, pattern matching works like the switch statement in C/ C++/ JavaScript or Java. Matching a subject value against one or more cases. If you don't know how they work, check out this guide. The main differences however, is that in Python we can also deconstruct/ unpack a pattern into its constituent parts and we can also return a value.
Some of the languages that have implemented pattern matching include Haskell, Scala, Rust and Erlang. TC39 is also considering a proposal to add pattern matching to JavaScript.
Requirements
- Python 3.10. If you don't have it, you can get it here
Syntax
At a high level, the syntax looks as follows:
pythonmatch expression: case pattern1: # do something case pattern2: # do something else
expression
can be a value or any valid python expression.
pattern1
and patttern2
could be any or a combination of the patterns below:
- Literal patterns
- Wildcard patterns
- Sequence Patterns
- Mapping Patterns
- Class Patterns
- Capture patterns
- OR Patterns
- AS Patterns
We will explore each of this in detail shortly. For every pattern, we will show examples of how you can use them in code.
Patterns
Literal patterns
Literal patterns match numbers (int, float, complex), strings, boolean (True, False) and None
Consider the example below where we return the name of the day of the week:
pythondef weekday_name(weekday_num): if weekday_num == 1: return "Monday" elif weekday_num == 2: return "Tuesday" elif weekday_num == 3: return "Wednesday" elif weekday_num == 4: return "Thursday" elif weekday_num == 5: return "Friday" elif weekday_num == 6: return "Saturday" elif weekday_num == 7: return "Sunday" return "Invalid. Day number should be an integer between 1-7"
We can refactor the code above using a match statement. In this case our patterns would be integers 1 through to 7.
pythondef weekday_name(weekday_num): match weekday_num: case 1: return "Monday" case 2: return "Tuesday" case 3: return "Wednesday" case 4: return "Thursday" case 5: return "Friday" case 6: return "Saturday" case 7: return "Sunday"
Open the interpreter and run the code above. Neat, right? Let us look at other patterns.
Wildcard patterns
Sometimes we want to specify a default action if no pattern was matched. In that case, a wildcard pattern is just what we need.
The wildcard pattern is written as case _:
Let us see it in action below:
Let us improve our weekday_name
function above to handle invalid inputs.
pythondef weekday_name(weekday_num): match weekday_num: case 1: return "Monday" case 2: return "Tuesday" case 3: return "Wednesday" case 4: return "Thursday" case 5: return "Friday" case 6: return "Saturday" case 7: return "Sunday" case _: return "Invalid. weekday_num should be an integer between 1 and 7"
_
is a wildcard pattern and matches anything that was not matched by the other case statements. In this case, the wildcard pattern will match any input that is not an integer between 1 and 7.
Sequence patterns
Sequence patterns match iterables which are instances of collections.abc.Sequence
. This includes lists and tuples.
Let's say we have a sorted list of names from a recently concluded election and we want to write a function that returns the top three candidates. However, we do not know the number of candidates beforehand. This can be easily achieved with the match statement
pythondef rank_candidates(candidates): match candidates: case [first]: # when num of candidates is 1 return {"first": first} case [first, second]: # when num of candidates is 2 return {"first": first, "second": second} case [first, second, third]: # when num of candidates is 3 return {"first": first, "second": second, "third": third} case [first, second, third, *rest]: # when num of candidates is >3 return {"first": first, "second": second, "third": third}
Notice the syntax looks very similar to iterable unpacking.
A sequence pattern can either be a fixed or variable length pattern. Fixed length patterns know the length of the sequence they are matching. Variable length patterns have a *
to denote an arbitrary length.
Consider our example above:
[first]:
is a fixed length pattern. It will match a list containing one name.[first, second]:
is a fixed length pattern. It will match a list containing two names.[first, second, third]:
is a fixed length pattern. It will match a list containing three names.[first, second, third, *rest]:
is a variable length pattern. This will match a list containing more than three names, however long.
A sequence pattern may have at most one *
.
Mapping patterns
Mapping patterns match mappings which are instances of collections.abc.Mapping
. This includes the python dict. Mapping patterns match mappings using their keys.
Say we want to get the names of all repositories owned by the python organization on Github. We can obtain a list of the repositories from the endpoint api.github.com/orgs/python/repos. We will use pattern matching to print the names of repositories that use the python language.
Install the requests library.
bashpip install requests
$$
Before Python 3.10, we could only implement the above using an if statement as shown below:
pythonimport requests response = requests.get("https://api.github.com/orgs/python/repos") repositories = response.json() for repo in repositories: language = repo.get("language) name = repo.get("name") if language == "Python": print(name)
Using Python 3.10, we can rewrite the above for loop to use a match statement instead.
pythonfor repo in repositories: match repo: case {"language": "Python", "name": name}: print(name)
{"language": "Python", "name": name}
is a mapping pattern that says:
- The dictionary representing a repository should have the keys
"language"
and"name"
. - The value of the
"language"
key should be equal to"Python"
. - Store the value of
name
in a variable calledname
. - Ignore all the other keys. Any key that is not included in the pattern will be ignored while matching.
With mapping patterns, we do not have to write code that validates the dictionary before extracting the values we want from the dict since we can specify the pattern we want.
Class patterns
A common use case in Python is checking if one class is the subclass of another before performing an operation. Python's pattern matching can also match classes, allowing us to check an object's type. Any class can be matched, even the built-in classes. Class patterns fulfill two purposes: checking whether a given subject is indeed an instance of a specific class, and extracting data from specific attributes of the subject.
Consider the function below that sums two integers. We only want to perform the addition if both numbers are of type int.
pythondef sum_two_integers(num1, num2): if isinstance(num1, int) and isinstance(num2, int): return num1 + num2
We can remove the isinstance checks by using the match statement
pythondef sum_two_integers(num1, num2): match num1, num2: case int(num1), int(num2): return num1 + num2
The pattern int(num1), int(num2)
is a class pattern that checks whether num1 and num2 are both of type int.
Capture patterns
Capture patterns help us to "capture" value(s) from the subject. A capture pattern provides a name that is used as the name of the variable that will be used to store the value of the subject.
Consider the code below:
pythonname = "Tarzan Mbogi Genje" match name.split(): case first_name, second_name, third_name: print(f"First name: {first_name}" )
The code above will print "First name: Tarzan"
name.split()
returns a list, ['Tarzan', 'Mbogi', 'Genje']
.
first_name, second_name, third_name
is a capture pattern that captures the elements of the array and stores them in the variables first_name
, second_name
and third_name
OR patterns
OR patterns consist of two or more patterns separated by vertical bars. For the pattern to succeed, one of the patterns must match. If both patterns do not match, the match fails.
Consider the example below that checks whether a value is a string or a number.
pythondef check_type(val): match val: case str(val): return "String" case int(val) | float(val): return "Number"
int(val) | float(val)
is an OR pattern that consists of two other patterns. The first pattern matches instances of int
and the second pattern matches instances of float
. If any of them is matched, the match succeeds and we return "Number"
.
In an OR pattern, all sub patterns must bind to the same variables. In our example above, the sub patterns bind to val
.
AS patterns
AS patterns allow us to specify a structure constraint and bind to a value at the same time.
In the check_type
function that we used to demonstrate OR patterns, notice we repeated the variable val
. We can avoid this by rewriting our pattern as an AS pattern:
pythondef check_type(val): match val: case str() as val: return "String" case int() | float() as val: return "Number"
In the above case, we store the subject in a variable called val
. AS patterns are excellent for binding values in an OR pattern.
Guards: Adding conditions to patterns
So far, we have seen that pattern matching works by imposing structural constraints on a subject and binding the subject to a value. However, we may still need to perform more checks/ filtering using boolean expressions. A pattern that has a boolean expression is said to have a guard.
Let's say we wanted to print the names of all repositories under the Python Organization that have more than 1000 stars. From the repository data returned by the Github API, a repository has a key, stargazers_count
, which stores the number of people who have starred that repository.
We could therefore write our code as follows:
pythonimport requests response = requests.get("https://api.github.com/orgs/python/repos") repositories = response.json() for repo in repositories: match repo: case {"name": name, "stargazers_count": stars} if stars > 1000: print(name)
{"language": "Python", "name": name, "stargazers_count": stars} if stars > 1000
is a mapping pattern that has a guard. Only the names of repositories that have a stargazers_count
of more than 1000 will be printed.
Conclusion
We have learnt about pattern matching in Python and seen some of the powerful patterns supported by Python. If you want to learn more about pattern matching, you can read the official spec; PEP 634, and motivation and rationale; PEP 635