How to use oTree + Prolific for a longitudinal study with cumulative progress (for beginners)

This is a practical blueprint for running repeated waves on Prolific while keeping participant progress and data consistent inside oTree. The goal is simple: recruit once, return the same people, and keep their progress intact without manual fixes.

Why this setup works

Longitudinal online experiments are fragile. You need stable identifiers, clean wave scheduling, and a predictable participant flow. This short article introduces how to combine oTree setup with Prolific features to achieve this.

I assume you will invite the same pool back for each wave. If you are using new participants each wave, drop the progress logic and keep only the Prolific integration.

Core architecture

Think in three stable layers:

  1. Prolific provides the participant ID and handles recruitment and re-contact.
  2. oTree stores a permanent key and wave status in participant.vars.
  3. You use a permanent link (and key) to identify the participant across waves.
The minimal persistent key is PROLIFIC_PID. Keep it in every wave. This is because when you create a new session of oTree and you deliver it via Prolific, as long as you properly pass the Prolific ID in the URL, oTree will always have access to it via participant.label.

Implementation details

I recommend you to deploy a longitudinal study (in several waves) into one study, instead of several studies. This is because it is easier for you to manage one study on oTree in terms of merging data with each other. If you do not need to track cumulative progress, and you are fine with merging different dataframes, it is fine to depoloy multiple studies on oTree. However, if you need a cumulative progress (i.e., participants first waves' reponses will determine on later waves' tasks), then you have to use one single study for all waves. This is especially useful for designs involving experience accumulation, learning tasks, and intertemporal choices.

To implement this, you need proper cooperation among Prolific, oTree, and Heroku. Since this is a tutorial for beginners, I will follow Dallas' blog post Integrating oTree with Prolific as a basis.

oTree side

You have to make sure you oTree code work perfectly, needless to say. One thing you have to consider is timing your experiment. This is to say, to deal with a longitudinal study, you should imagine your study as a single-session game with multiple rounds. However, between rounds, there is a long break (e.g., a week). Especially, when you run this on Prolific, you should not expect everyone starts simultaneously. So an optimal strategy is to write functions such that when a participant reaches a bridge point (i.e., the end of a wave), this function checks the current timestamp and creates a next available wave for this participant. That is, after a certain date, this page will be automatically proceeded to the next wave. Before that date, the participant will still see this bridge page. This way, you do not need to manually intervene to manage who can join the next wave. To do this, you shall first define which are the rounds for each wave. For example, you may identify round numbers in Constants so that oTree when should it schedule the next wave.

To be specific, you define a static method in the last page of each wave that:

@staticmethod
    def before_next_page(player, timeout_happened):
    store_cutoff_choice(player, choice_lottery)
    if player.round_number == Constants.initial_evaluation_rounds:
        schedule_session_start(
            player,
            prefix='session2',
            wait_seconds=15, #Set to your desired time (in seconds).
            future_round=Constants.continuation_rounds[0], # the first round of the next wave
        )

Here, the function schedule_session_start is a custom function that you write to check the current time and set the next available wave for this participant.

Here is an example of schedule_session_start:


def schedule_session_start(player, prefix, wait_seconds, future_round):
    """Store the scheduled start time for the next session and propagate it."""
    participant = player.participant
    existing_ts = participant.vars.get(f'{prefix}_start')
    existing_readable = participant.vars.get(f'{prefix}_start_readable')
    if existing_ts is not None:
        if existing_readable is None:
            try:
                existing_readable = datetime.fromtimestamp(existing_ts).strftime('%A, %B %d')
            except (TypeError, OSError, ValueError):
                existing_readable = None
        setattr(player, f'{prefix}_start', existing_ts)
        if existing_readable is not None:
            setattr(player, f'{prefix}_start_readable', existing_readable)
            participant.vars[f'{prefix}_start_readable'] = existing_readable
        if future_round and future_round <= Constants.num_rounds:
            future_player = player.in_round(future_round)
            setattr(future_player, f'{prefix}_start', existing_ts)
            if existing_readable is not None:
                setattr(future_player, f'{prefix}_start_readable', existing_readable)
        return existing_ts, existing_readable

    t = datetime.now() + timedelta(seconds=wait_seconds)
    start_ts = t.timestamp()
    readable = t.strftime('%A, %B %d')
    setattr(player, f'{prefix}_start', start_ts)
    setattr(player, f'{prefix}_start_readable', readable)
    participant.vars[f'{prefix}_start'] = start_ts
    participant.vars[f'{prefix}_start_readable'] = readable
    if future_round and future_round <= Constants.num_rounds:
        future_player = player.in_round(future_round)
        setattr(future_player, f'{prefix}_start', start_ts)
        setattr(future_player, f'{prefix}_start_readable', readable)
    return start_ts, readable

If you do not need the feature that for different sessions, you share a different start time, you can simplify the function by removing the propagation logic.

After you have prepared them, we have to implement the logic for the bridging page. This is an example:

class BridgeSession2(Page):
    @staticmethod
    def is_displayed(player):
        return should_show_bridge(player, Constants.continuation_rounds[0], 'session2')

    @staticmethod
    def vars_for_template(player):
        return build_bridge_context(player, current_session=1, next_session=2, prefix='session2')

should_show_bridge keeps participants on the bridge page until the scheduled start time, while build_bridge_context supplies the template with timing variables for display.


def should_show_bridge(player, expected_round, prefix):
    """Return True when the participant should remain on the bridge page."""
    if player.round_number != expected_round:
        return False
    start_ts, _ = get_session_start_info(player, prefix)
    if start_ts is None:
        return False
    return time.time() < start_ts


def build_bridge_context(player, current_session, next_session, prefix):
    """Prepare template data shared by both bridge pages."""
    start_ts, readable = get_session_start_info(player, prefix)
    return {
        'this_session': current_session,
        'next_session': next_session,
        'wait_timestamp': start_ts,
        'wait_readable': readable,
        'server_timestamp': time.time(),
    }

Other setups, such as passing Prolific IDs via URL parameters, remain the same as in Dallas' blog post (and also the most practiced way).

Heroku Side

I recommend everyone to use Heroku + oTree Hub for hosting oTree studies. The main reason is that Heroku provides a stable and scalable environment for running web applications, which is essential for online experiments. oTree Hub really simplifies the deployment process, making it easier for beginners to get started without worrying about server configurations. Back in the day, I had to manually set up servers, which was time-consuming and error-prone. With Heroku + oTree Hub, I can focus more on designing experiments rather than dealing with technical issues. If you are a student, you can also claim free Heroku credits through GitHub Student Developer Pack.

I do not recommend you using oTree HR to host your longitudinal study through Prolific. This is because HR is mainly designed for simple studies with one-time participation.

You need a few setups for Heroku. First, you have to consider that you need to maintain your study for a long time, so you should choose a plan that offers sufficient uptime and resources. One important thing is that you need to choose the correct Dyno. The most common choice is the basic or the standard dyno (depending on your budget and access demand). However, you should not choose the Eco Dyno. This is because Eco Dynos sleep after 30 minutes of inactivity, which can disrupt participant sessions and lead to data loss. In contrast, basic and standard dynos remain active, ensuring a smoother experience for participants. Here is some more information about Dynos.

I want to write a bit more regarding server loading. If you expect a large number of participants to join your study simultaneously (e.g., during peak hours or specific recruitment windows), you might want to consider using higher dynos to handle the load. This can help prevent server slowdowns or crashes, ensuring a smoother experience for participants. However, for most longitudinal studies with staggered participation, the Standard 1X dyno should suffice. If you really expect high traffice, you should consider to deploy your own server or use the Performance Dyno, and always use the oTree's automatic robot feature (here) to stress test your current subscription on Heroku. I do think most of the researchers do not need to go this far.

Finally, if you are really unsure about how your participation flow will have an impact on your server, you can set on the Prolific that only a limited number of participants can join at the same time. This can help you manage server load and ensure a better experience for participants.

Prolific Side

One has to admit that Prolific is really user-friendly for running online experiments. There is not much to say from the Prolific side. But I want to thank Jodie from Prolific for reading my super long mails and bounced back with very useful suggestions really quickly. Most of the techniques here may not be new, but you have to consider the whole pipeline.

Prolific now has a new feature called "Project". I recommend you to use this feature to manage your longitudinal study. This allows all participants see how many waves are remaining, and can use one single link to join all waves.

Everyone knows that Prolific can pass URL parameters to oTree. You have to make sure you always pass the participant_label parameter to oTree. This is crucial for maintaining participant identity across waves. In each wave, you should use the same Prolific ID to ensure that oTree can correctly identify returning participants and retrieve their progress from previous waves. Check Dallas' blog post for more details on how to set this up. Prolific can also pass STUDY_ID and SESSION_ID if you include them in the URL, but oTree won't use those unless you store them yourself.

Prolific also offers a full guide on running longitudinal studies here. This is a must-read for everyone who wants to run longitudinal studies on Prolific. The key idea is that you should set up the follow-up waves as separate studies in advance. The study URL should remain the same (because they get back from where they leave). You should recruit all participants in the first wave, and then use Prolific's re-contact feature to invite them back for subsequent waves. This way, you can ensure that the same participants are involved throughout the study.

Resources