pythonpytest
Ben Gorman

Ben Gorman

Life's a garden. Dig it.

You own an ecommerce store that sells costumes for dogs 🐶. You'd like to analyze your web traffic by measuring the count of unique visitors amongst your web sessions data. Additionally, you want the ability to filter by date, so you can answer questions like

How many visitors visited my site?

and

How many visitors visited my site on February 20th, 2022?

You come up with the following count_visitors() function that you place inside measures.py.

measures.py
def count_visitors(sessions, date=None):
    """
    Count the number of unique visitors
 
    :param sessions: list of sessions were each session has a visitor_id and date
    :param date: optional filter by date
    :return: number of unique visitors
    """
 
    if date is None:
        visitors = {x.visitor_id for x in sessions}
    else:
        visitors = {x.visitor_id for x in sessions if x.date == date}
 
    return len(visitors)

To test it, you create test_measures.py as follows.

test_measures.py
from datetime import date
from types import SimpleNamespace
import time
 
from measures import count_visitors
 
def load_sessions():
    """Super expensive function that loads the sessions data"""
 
    print("loading sessions data.. hold on a sec")
    time.sleep(5)
 
    sessions = [
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-02')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-02')),
        SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-04')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-11')),
        SimpleNamespace(visitor_id=128, date=date.fromisoformat('2022-02-15')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-17')),
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-17'))
    ]
 
    return sessions
 
 
def test_count_visitors():
    """Confirm that count_visitors works as expected"""
 
    sessions = load_sessions()
    assert count_visitors(sessions) == 4
 
 
def test_count_visitors_on_date():
    """Confirm that count_visitors works as expected when a date filter is used"""
 
    sessions = load_sessions()
    assert count_visitors(sessions, date=date.fromisoformat('2022-02-17')) == 2

You have a problem.. Both of your test functions call the load_sessions() function in order to load the sessions data, but this data is really expensive (slow) to fetch. (In theory, load_sessions() might connect to a database and run an expensive query.) See if you can cut your tests runtime in half 😉.

Run with messages

Notice the print() statement inside the load_sessions() function. To view the output of this print statement in the console, run the tests with pytest -s as opposed to simply pytest. The -s flag tells pytest not to capture the result of standard out as it normally does.

pytest output: pytest_without_s-flag

pytest -s output: pytest_with_s-flag

Directory Structure

dog_costumes/
  measures.py
  test_measures.py

Solution

from datetime import date
from types import SimpleNamespace
import time
import pytest
 
from measures import count_visitors
 
@pytest.fixture(scope="module")
def load_sessions():
    """Super expensive function that loads the sessions data"""
 
    print("loading sessions data.. hold on a sec")
    time.sleep(5)
 
    sessions = [
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-02')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-02')),
        SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-04')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=123, date=date.fromisoformat('2022-02-10')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-11')),
        SimpleNamespace(visitor_id=128, date=date.fromisoformat('2022-02-15')),
        SimpleNamespace(visitor_id=925, date=date.fromisoformat('2022-02-17')),
        SimpleNamespace(visitor_id=372, date=date.fromisoformat('2022-02-17'))
    ]
 
    return sessions
 
 
def test_count_visitors(load_sessions):
    """Confirm that count_visitors works as expected"""
 
    assert count_visitors(load_sessions) == 4
 
 
def test_count_visitors_on_date(load_sessions):
    """Confirm that count_visitors works as expected when a date filter is used"""
 
    assert count_visitors(load_sessions, date=date.fromisoformat('2022-02-17')) == 2

Now when we run pytest -s, we get the following output:

pytest_with_fixture

Notice the message "loading sessions data.. hold on a sec" appears once - not twice as it did originally. This indicates that the expensive load_sessions() function only ran once.

Explanation

The trick to this solution is to make use of pytest's fixture decorator which allows us to cache the output value of a function. In this case, we cache the output of load_sessions() so we can use it in test_count_visitors() and then reuse it in test_count_visitors_on_date(). To make this work, note a few things:

  1. We have to import pytest

  2. We have to decorate load_sessions() with @pytest.fixture(scope="module").

    @pytest.fixture(scope="module")
        def load_sessions():
            """Super expensive function that loads the sessions data"""
            ...

    scope="module" tells pytest it can reuse the data within the same module (python file). However, if we were to reference the load_sessions() function from a different module, the function would be re-executed.

  3. We have to pass load_sessions into its dependent test functions as a parameter, and tweak the internals accordingly.

    def test_count_visitors(load_sessions):
        """Confirm that count_visitors works as expected"""
     
        assert count_visitors(load_sessions) == 4
     
     
    def test_count_visitors_on_date(load_sessions):
        """Confirm that count_visitors works as expected when a date filter is used"""
     
        assert count_visitors(load_sessions, date=date.fromisoformat('2022-02-17')) == 2