State selection and navigation

State selection

Once a biography is created, you may modify and enhance it by adding display bands for different states.  You may add display bands of the states for the filtered-in actors (e.g.their earnings).  You may also add display bands for linked actors (e.g.their spouses) as well as for the states of these linked actors (e.g. the educational status of the spouses).

The following set of buttons is used for state selection and navigation:

Pictures of the navigation buttons or icons

In order: Add, First, Previous, Next and Last

Add more states by using the Add button shown above and displayed to the right of the chart area.  States added using the button are always added to the bottom of the chart area.  However, for more functionality, use the pop-up menus over the chart area. These menus allow both insertion and deletion of states anywhere on the screen. Insertion will insert the state after the clicked position. Deletion will delete the state at the clicked position without notification unless an actor band is deleted with dependent states below it. In this case, BioBrowser will issue a warning indicating how many states will be deleted and provide an option to ignore the delete. The filter tracking band cannot be deleted.

You may add any number of states, any number of times, subject to the limitations of your monitor. The states can be added in any order subject to maintaining the visual hierarchy of the link bands. In this case, the Add button for that band will be disabled.  The arrows indicate the indentation in the hierarchy.

Edit/Undo Last Add can be used to repeatedly delete any number of states from the bottom up.  To automate navigation of the filtered-in actors (the topmost display band), use the Timer command from the Tools Menu, which will navigate through each actor automatically. To go to a specific object in the filter band, use the GoTo command in the Browse Menu.  All state selections and the current positions of navigation bands are saved with File/Save.

Here is an example of the Add/insert states dialog box for the demo file supplied with this application.  In this case, the Add button from the top navigation band was clicked, showing the states for the person actor.  Use extended selection to select/unselect more than one state i.e. Ctrl-Click to select/unselect states, Shift-Click to select a range of states. Press the OK button when your selection is complete.

Dialog box to add or insert additional states to biography

Note above, that the description of the tracking state contains the tracking condition for this actor used at database creation time. In this case, non-dominant persons (spouses) are tracked only when their marital status is married or remarried. 

State Navigation

Navigate by using the First, Previous, Next, and Last buttons shown above or by using a pop-up menu over a navigation band. The pop-up menu has the additional functionality of a “Go To” command.

The first navigation band always refers to the filter query.  If you add a linked actor, a new navigation band is created by BioBrowser.  This band differs from the topmost band in several ways. First, unlike the topmost display band which shows the tracking state for the filtered-in actors, those for the linked actors graphically display the time frame in which the related actors are linked to the filtered-in actors (as opposed to when the related actors were tracked by Modgen).  If you wish to see the tracking state, then you may add it as a separate display band.

Secondly, for linked actors, you are permitted to navigate beyond the total count for the current set of actors within the band.  This is useful when adding the same link more than once.  For example, a person actor may be linked to multiple child actors.  This actor may have 0 to 6 children. If you wish to see certain states for the first 2 children within the same biography window, add the link to child state twice and position the navigation bands at 1 and 2 respectively. These positions are retained as you navigate from person to person.  If the current person-actor has no children, this navigation will still work although the bands will show (1/0) and (2/0) and no states will be displayed.  In this case, the First button will go to 1/0, the Last button will have no effect on the position.

With small screen resolutions, you may choose to hide the navigation bands and use the pop-up menus for movement. This is illustrated below.

Illustration of using pop-menu for navigation

Alternate keyboard navigation is available for the top filter band.  The Filter/Browse menu contains a Go To and the four movements with optional Ctrl key equivalents: Ctrl+G for GoTo, Ctrl+Q for First, Ctrl+W for Previous, Ctrl+E for Next and Ctrl+R for Last.

Date modified:

When is dynamic microsimulation the appropriate simulation approach?

Whenever we study the dynamics of a system made up of smaller scale units, microsimulation is a possible simulation approach – but when is it worth the trouble creating thousands or millions of micro units? In this section we give three answers to this question, the first focusing on population heterogeneity, the second on the difficulty in aggregating behavioural relations, and the third on individual histories.

Population heterogeneity

Microsimulation is the preferred modeling choice if individuals are different, if differences matter, and if there are too many possible combinations of considered characteristics to split the population into a manageable number of groups.

Most of classical macroeconomic theory is based on the assumption that the household sector can be represented by one representative agent. Individuals are assumed to be identical or, in the case of overlapping generation models, to differ only by age. (Each cohort is represented by one representative agent). However, such an approach is not applicable whenever finer grained distributions matter. Imagine we are interested in studying the sustainability and distributional impact of a tax benefit system. If there is only one representative individual and the tax benefit system is balanced, this average person will receive in benefits and services what she pays for through taxes and social insurance contributions (with some of her work hours spent to administer the system). To model tax revenues, we have to account for the heterogeneity in the population--if income taxes are progressive, tax revenues depend not only on total income but also its distribution. When designing tax reform, we usually aim at distributing burdens differently. We have to represent the heterogeneity of the population in the model to identify the winners and losers of reform.

Microsimulation is not the only modeling choice when dealing with heterogeneity. The alternative is to group people by combinations of relevant characteristics instead of representing each person individually. This is done in cell-based models. The two approaches have a direct analogy in how data are stored: a set of individual records versus a cross-classification table in which each cell corresponds to a combination of characteristics. A population census can serve as an example. If we were only interested in age and sex breakdowns, a census could be conducted by counting the individuals with each combination of characteristics. The whole census could be displayed in a single table stored as a spreadsheet. However, if we were to add characteristics to our description beyond age and sex, the number of table cells would grow exponentially, making this approach increasingly impractical. For example, 12 variables or characteristics with 6 levels each would force us to group our population into more than 2 billion cells (6^12 = 2,176,782,336). We would quickly end up with more cells than people. In the presence of continuous variables (e.g. income) the grouping approach becomes impossible altogether without losing information, since we would have to group data (e.g. defining income levels). The solution is to keep the characteristics of each person in an individual record – the questionnaire – and eventually a database row.

These two types of data representation (cross-classification table versus a set of individual records) correspond to the two types of dynamic simulation. In cell-based models, we update a table; in microsimulation models, we change the characteristics of every single record (and create a new record at each birth event). In the first case we have to find formulas on how the occupancy of each cell changes over time; in the second we have to model individual changes over time. Both approaches aim at modeling the same processes but on different levels. Modeling on the macro level might save us a lot of work but is only possible under restrictive conditions since the individual behavioural relations themselves need to be aggregated, which is not always possible. Otherwise no formulas will exist on how the occupancy of each cell changes over time.

Contrasting microsimulation to cell-based models is fruitful for the understanding of the microsimulation approach. In the following we further develop this comparison using population projections as an example. With a cell-based approach, if we are only interested in total population numbers by age, updating an aggregated table (a population pyramid) only requires a few pieces of information: age-specific fertility rates, age-specific mortality rates, and the age distribution in the previous period. In the absence of migration, the population of age x in period t is the surviving population from age x-1 in the period t-1. For a given mortality assumption, we can directly calculate the expected future population size of age x. With a microsimulation approach, survival corresponds to an individual probability (or rate, if we model in continuous time). An assumption that 95% of an age group will be still alive in a year results in a stochastic process at the micro level--individuals can be either alive or dead. We draw a random number between 0 and 1--if it is below the .95 threshold, the simulated person survives. Such an exercise is called Monte Carlo simulation. Due to this random element, each simulation experiment will result in a slightly different aggregated outcome, converging to the expected value as we increase the simulated population size. This difference in aggregate results is called Monte Carlo variation which is a typical attribute of microsimulation.

The problem of aggregation

Microsimulation is the adequate modeling choice if behaviours are complex at the macro level but better understood at the micro level.

Many behaviours are modeled much more easily at the micro level, as this is where decisions are taken and tax rules are defined. In many cases, behaviours are also more stable at the micro level at which there is no interference from composition effects. Even complete stability at the micro level does not automatically correspond to stability at the macro level. For example, looking at educational attainment, one of the best predictors of educational decisions is parents’ education. So if we observe an educational expansion – e.g. increasing graduation rates - at the population level, the reason is not necessarily a change of micro behaviour; it can lie entirely in the changing composition of the parents’ generation.

Tax and social security regulations tie rules in a non-linear way to individual and family characteristics, impeding the aggregation of their operations. Again, there is no formula to directly calculate the effect of reform or the sustainability of a system, not even ignoring distributive issues. To calculate total tax revenues, we need to know the composition of the population by income (progressive taxes), family characteristics (dependent children and spouses) and all other characteristics which affect the calculation of individual tax liability. Using microsimulation, we are able to model such a system at any level of detail at the micro level and to then aggregate individual taxes, contributions and benefits.

Individual histories

Microsimulation is the only modeling choice if individual histories matter, i.e. when processes possess memory.

School dropout is influenced by previous dropout experiences, mortality by smoking histories, old age pensions by individual contribution histories, and unemployment by previous unemployment spells and durations. Processes of interest in the social sciences are frequently of this type, i.e. they have a memory. For such processes, events that have occurred in the past can have a direct influence on what happens in the future. This impedes the use of cell-based models because once a cell is entered, all information on previous cell membership is lost. In such cases, microsimulation thus becomes the only available modeling option.

Date modified:

Strengths and drawbacks

The strengths of microsimulation unfold in three dimensions. Microsimulation is attractive from a theoretical point of view, as it supports innovative research embedded into modern research paradigms like the life course perspective. (In this respect, microsimulation is the logical next step following life course analysis.) Microsimulation is attractive from a practical point of view, as it can provide the tools for the study and projection of sociodemographic and socioeconomic dynamics of high policy relevance. And microsimulation is attractive from a technical perspective, since it is not restricted with respect to variable and process types, as is the case with cell-based models.

Strengths of microsimulation from a theoretical perspective

The massive social and demographic change in the last decades went hand in hand with tremendous technological progress. The ability to process large amounts of data has boosted data collection and enabled new survey designs and methods of data analysis. These developments went hand in hand with a general paradigm shift in the social sciences, many of the changes pointing in the same direction as Orcutt’s vision. Among them is the general shift from macro to micro, moving individuals within their context into the centre of research. Another change relates to the increasing emphasis on processes rather than static structures, bringing in the concepts of causality and time. While the microsimulation approach supports both of these new focuses of attention, it constitutes the main tool for a third trend in research: the move from analysis to synthesis (Willekens 1999). Microsimulation links multiple elementary processes in order to generate complex dynamics and to quantify what a given process contributes to the complex pattern of change.

These trends in social sciences are mirrored in the emergence of the life course paradigm which connects social change, social structure, and individual action (Giele and Elder 1998). Its multidimensional and dynamic view is reflected in longitudinal research and the collection of longitudinal data. Individual lives are described as a multitude of parallel and interacting careers like education, work, partnership, and parenthood. The states of each career are changed by events whose timing is collected in surveys and respectively simulated in microsimulation models. Various strengths of the microsimulation approach have a direct correspondence to key concepts of the life course perspective, making it the logical approach for the study and projection of social phenomena.

Microsimulation is well suited to simulate the interaction of careers, as it allows for both the modeling of processes that have a memory (i.e. individuals have a memory of past events of various career domains) and the modeling of various parallel careers with event probabilities or hazards of one career responding to state changes in other careers.

Besides the recognition of interactions between careers, the life course perspective emphasizes the interaction between individuals--the concept of linked lives. Microsimulation is a powerful tool to study and project these interactions. This could include changes in kinship networks (Wachter 1995), intergenerational transfers and transmission of characteristics like education  (Spielauer 2004), and the transmission of diseases like AIDS.

According to the life course perspective, the current situation and decisions of a person can be seen as the consequence of past experiences and future expectations, and as an integration of individual motives and external constraints. In this way, human agency and individual goal orientation are part of the explanatory framework. One of the main mechanisms with which individuals confront the challenges of life is by the timing of life course events of parallel – and often difficult to reconcile - careers like work and parenthood. Microsimulation supports the modeling of individual agency, as all decisions and events are modeled at the level where they take place and models can account for the individual context. Besides these intrinsic strengths of microsimulation, microsimulation also does not impose any restrictions of how decisions are modeled, i.e. it allows for any kind of behavioural models which can be expressed in computer code.

Strengths of microsimulation from a practical perspective

The ability to create models for the projection of policy effects lies at the core of Orcutt’s vision. The attractiveness of dynamic microsimulation in policymaking is closely linked to the intrinsic strengths of this approach. It allows the modeling of policies at any level of detail, and it is prepared to address distributional issues as well as issues of long-term sustainability. A part of this power unfolds already in static tax benefit microsimulation models, which have become a standard tool for policy analysis in most developed countries. These models resulted from the increased interest among policy makers in distributional studies, but are limited to cross-sectional studies by nature. While limited tax benefit projections into the future are possible with static microsimulation models by re-weighting the individuals of an initial population to represent a future population (and by upgrading income and other variables), this approach lacks the longitudinal dimension, i.e. the individual life courses (and contribution histories) simulated in dynamic models. The importance of dynamics in policy applications was most prominently recognized in the design and modeling of pension systems, which are highly affected by population aging. Pension models are also good examples of applications where both individual (contribution) histories and the concept of linked lives (survivor pensions) matter. Another example is the planning of care institutions whose demand is driven by population aging as well as by changing kinship networks and labour market participation (i.e. the main factors affecting the availability of informal care).

Given the rapid rate of social and demographic change, the need for a longitudinal perspective has quickly been recognized in most other policy areas which benefit from detailed projections and the “virtual world” or test environment provided by dynamic microsimulation models. The longitudinal aspect of dynamic microsimulation is not only important for sustainability issues but also extends the scope of how the distributional impact of policies can be analyzed. Microsimulation can be used to analyze distributions on a lifetime basis and to address questions of intergenerational fairness. An example is the possibility to study and compare the distribution of rates of return of individual contribution and benefit histories over the whole individual lifespan.

Strengths of microsimulation from a technical perspective

From a technical point of view, the main strength of microsimulation is that it is not subject to the restrictions which are typical in other modeling approaches. Unlike cell-based models, microsimulation can handle any number of variables of any type. Compared to macro models, there is no need to aggregate behavioural relations which, in macro models, is only possible under restrictive assumptions. With microsimulation, there are no restrictions on how individual behaviours are modeled, as it is the behavioural outcomes which are aggregated. In other words, there are no restrictions on process types. Most importantly, microsimulation allows for Non-Markov processes, i.e. processes which possess a memory. Based on micro data, microsimulation allows flexible aggregation, as the information may be cross-tabulated in any form, while in aggregate approaches, the aggregation scheme is determined a priori. Simulation results can be displayed and accounted for simultaneously in various ways--in aggregate time series, cross-sectional joint distributions, and individual and family life paths.

What is the price? Drawbacks and limitations

Microsimulation has three types of drawbacks (and preconceptions) which are of a very different nature: aesthetics, the fundamental limitations inherent to all forecasting, and costs.

If beauty is to be found in simplicity and mathematical elegance (a view not uncommon in mainstream economics), microsimulation models violate all rules of aesthetics. Larger scale microsimulation models require countless parameters estimated from various data sources which are frequently not easy to reconcile. Policy simulation requires tiresome accounting, and microsimulation models, due to their complexity, are always in danger of becoming hard- to-operate-and-understand black boxes. While there is clearly room for improvement in the documentation and user interface of microsimulation models, the sacrifice of elegance for usefulness will always apply to this modeling approach.

The second drawback is more fundamental. The central limitation of microsimulation lies in the fact that the degree of model detail does not go hand in hand with overall prediction power. The reason for this can be found in what is called randomness, partly caused by the stochastic nature of microsimulation models, and partly due to accumulated errors and biases of variable values. The trade-off between detail and possible bias is already present in the choice of data sources, since the sample size of surveys does not go hand in hand with the model’s level of detail. There is a trade-off between the additional randomness introduced by additional variables and misspecification errors caused by models that are too simplified. This means that the feature that makes microsimulation especially attractive, namely the large number of variables that models can include, comes at the price of randomness and the resulting prediction power that weakens or decreases as the number of variables increases. This generates a trade-off between good aggregate predictions versus a good prediction regarding distributional issues in the long run, a fact that modellers have to be aware of. This trade-off problem is not specific to microsimulation, but since microsimulation is typically employed for detailed projections, the scope for randomness becomes accordingly large. Not surprisingly, in many large-scale models some processes are aligned or calibrated towards aggregated projections obtained by external means.

Besides the fundamental nature of this type of randomness, its extent also depends on data reliability or quality. In this respect we can observe and expect various improvements as more and more detailed data becomes available for research, not only in the form of survey data but also administrative data. The latter has boosted microsimulation, especially in the Nordic European countries.

Since microsimulation produces not expected values but instead random variables distributed around the expected values, it is subject to another type of randomness: Monte Carlo variability. Every simulation experiment will produce different aggregate results. While this was cumbersome in times of limited computer power, many repeated experiments and/or the simulation of large populations can eliminate this sort of randomness and deliver valuable information on the distribution of results, in addition to point estimates.

The third type of drawback is related to development costs. Microsimulation models have a need for high-quality, longitudinal and sometimes highly specific types of data--and there are costs involved to acquire and compile such data. Note that such costs are not explicit costs associated with the actual microsimulation itself but represent the price to be paid for longitudinal research in general and informed policy making in particular.

Microsimulation models also usually require large investments with respect to both manpower and hardware. However, these costs can be expected to further decrease over time as hardware prices fall and more powerful and efficient computer languages become available. Still, many researchers perceive entry barriers to be high. While many do recognize the potential of microsimulation, they remain sceptical about the feasibility of its technical implementation within the framework of smaller research projects. We hope that the availability of the Modgen language lowers this perceived barrier and makes microsimulation more accessible in the research community. In the last couple of years, various smaller-scale microsimulation models have been developed alongside PhD projects or as parts of single studies. Modgen can both speed up the programming of smaller applications and provide a tested and maintained modeling platform for large-scale models, such as Statistics Canada’s LifePaths and Pohem models.

Date modified:

Introduction

Modgen is a microsimulation model development package developed by and distributed through Statistics Canada. It was designed to ease the creation, maintenance, and documentation of microsimulation models without the need for advanced programming skills as a prerequisite. It accommodates many different model approaches (continuous or discrete time, case-based or time-based, general or specialized, etc.) Modgen also provides a common visual interface for each model that implements useful functionality such as scenario management, parameter input, the display of output tables from a model run, graphical output of individual biographies, and the display of detailed Modgen-generated model documentation.

In this discussion we introduce a simple microsimulation model called RiskPaths that has been implemented using Modgen. We start with a description of its underlying statistical models and then explore follow-up questions, such as what microsimulation can add to the initial statistical analysis and what other benefits microsimulation can bring to the overall analysis. We then demonstrate parts of Modgen's visual interface to examine elements of the RiskPaths model.

RiskPaths can be used as a model to study childlessness and was developed for training purposes. Technically, RiskPaths is a demographic single sex (female only), data-driven, specialized, continuous time, case-based, competing risk cohort model. It is based on a set of piecewise constant hazard regression models.

In essence, RiskPaths allows the comparison of basic demographic behaviour before and after the political and economic transitions experienced by Russia and Bulgaria around 1989. Its parameters were estimated from Russian and Bulgarian data of the Generations and Gender Survey conducted around 2003/04. Russia and Bulgaria comprise interesting study cases since both countries, after the collapse of socialism, underwent the biggest fertility declines ever observed in history during periods of peace. Furthermore, demographic patterns were very similar and stable in socialist times for both countries, which helps to justify the use of single cohorts as a means of comparison (one representing life in socialist times, the other the life of a post-transition cohort). In this way, the model allows us to compare demographic behaviour before and after the transition, as well as between the two countries themselves.

Date modified:

Introduction

In this discussion we explore the microsimulation model development package Modgen and the Modgen application RiskPaths from the model developer's point of view. We first introduce the Modgen programming environment, and then discuss basic Modgen language concepts and the RiskPaths code. Modgen requires only moderate programming skills; thus, after some training, it enables social scientists to create their own models without the need for professional programmers. This is possible because Modgen hides underlying mechanisms like event queuing and automatically creates a stand-alone model with a complete visual interface, including scenario management and model documentation (as introduced in the previous chapter). Model developers can therefore concentrate on model specific code: the declaration of parameters, the states defining the simulated actors, and the events changing the states. High efficiency coding extends also to model output. Modgen includes a powerful language to handle continuous time tabulation. These tabulations are created on-the-fly when simulations are run and the programming to generate them usually requires only a few lines of code per table. Modgen also has a built-in mechanism for estimating the Monte Carlo variation for any cell of any table, without requiring any programming by the model developer.

Being a simple model, RiskPaths does not make use of the full range of available Modgen language concepts and capabilities. The discussion in this chapter does not intend to replace existing Modgen documentation, such as the Modgen Developer's Guide, either. But by introducing the main concepts of Modgen programming, we aim to help you get started in Modgen model development and to engage in further exploration.

Date modified:

Basic Modgen concepts

Actor: An actor is the entity whose life is simulated in a Modgen model. A model's actor is often a person, although this is not a requirement--other models have been developed that use dwellings or occupations as actors. Nevertheless, in RiskPaths, the actor is a person or more specifically, a female (since it is a model for the study of childlessness)

State: States describe the characteristics of a model's actors. Some states can be continuous, such as age, whereas others are categorical, such as gender. For categorical states, the actual categories or levels are defined via Modgen's classification command.

Overall, there are two major kinds of states in Modgen-simple states and derived states, both of which are used by RiskPaths and both of which are declared within an actor declaration. A simple state is a state whose value can be initialized and changed by the code that a model developer creates. Simple states are changed by explicitly declared events. A derived state, on the other hand, is a state whose value is given as an expression which is normally derived from or based on other states. A derived state's values are automatically maintained by Modgen throughout a simulation run. An additional useful Modgen concept is the self-scheduling derived state. This is a state which changes in a predefined time sequence, such as integer_age, a state in RiskPaths that changes at each birthday.

Event: In Modgen, simulation takes place through the execution of events. Each event consists of two functions: a time function to determine the time of the next occurrence of the event, and an implementation function to determine the consequences when the event happens. RiskPaths has several events, including a mortality event, union formation and dissolution events, and a first pregnancy event.

Parameter: Parameters are used to give model users a degree of control over the simulations they run. The ability to change different hazards or probabilities that affect various aspects of a simulation allows different scenarios to be explored. Parameters can have many dimensions (such as age, gender, and year) and are stored in .dat data files. In RiskPaths, there is one parameter file, Base(RiskPaths).dat, which stores parameters such as death probabilities by age and risks of first pregnancy by age group. More complex models will usually incorporate more than one .dat file.

Table: Modgen has a powerful cross-tabulation facility built in to report aggregated results in the form of tables. There are two central elements of a table declaration-its captured dimensions (defining when an actor enters and leaves a cell) and its analysis dimension (recording what happens while an actor is in that cell). When running simulations, the tabulations to fill a table are created on the fly, thus removing the need to create and write to large temporary interim files for subsequent reporting. Several examples of table declarations are shown later in this document for the RiskPaths model.

Date modified:

How to use BioBrowser: The basics

Beginning a BioBrowser session

To begin a BioBrowser session, click the Start button on your toolbar, choose Programs, and then select BioBrowser.

Opening an existing database file: The File / Open Database command

BioBrowser automatically invokes this command at the start of every session. A database file (created by Modgen) must be open before any graphical representations (saved as a biography) can be created.  Although only one database file can be open at a time, any number of biographies can be viewed simultaneously.

A sample database demo(trk).mdb has been supplied as part of the BioBrowser installation.

Dialog box to open demo(trk).mdb database file

Opening a saved biography: The File / Open command

After you have opened the database upon the startup of BioBrowser, you will be asked to either create a new biography or open a saved biography.  One saved biography demo.bbr has been supplied with the installation software.  Choose the Open button.

Information dialog box for opening biography files

You will be prompted for the name of the saved file to open. Choose demo.bbr.

Dialog box to open the saved biography demo.bbr

Creating a new biography: The File / New Biography command

Creating a new biography involves the selection of the actors which you want to graph by choosing a starting actor and a filter.  The way in which the open database was defined in the Modgen model will limit the available choices.  The Starting actor is the type of actor whose state characteristics will be graphed.  Other actors who are linked to these starting actors (e.g.parents, spouses or children) may be added later in the BioBrowser process.  In our example, two starting actors are available: (i) persons, whose states are included in the database only if they are dominant actors or other married or remarried individuals, and (ii) children.

This choice of starting actor may give a large set of actors to be graphed, depending on the size of the Modgen database.  The filter criteria enables you to narrow the focus of the biography. Select a state, an operator on that state, and a value.  This generates a SQL query on the database.  The result of the query is a set of actors which satisfy the filter criteria.  At present, two states can be used to determine the filter.  When selecting two states, you must choose if the criteria will be joined with an “And” or an “Or” condition. If the result of the query is non-empty, then a new biography is opened. Once the biography is open, you may change the criteria using menu item Filter/Criteria…, as discussed in Changing the Biography Filter.

The tracking state is a variable which indicates the ranges of time in which the actors and their states are included in the database.  If you wish to browse all the starting actors, set the filter to Tracking = TRUE in the Filter Criteria section of the Biography Filter dialog box.. 

In the “Filter Description”box you may change the textual description of the filter you have chosen.  It will appear on the bottom of the BioBrowser screen.

In the example below, all person actors whose dominant state is True were selected for browsing. This is a logical state whose value is either True or False for the actor’s lifetime. Since the demo(trk).mdb file was created with only 20 cases, all 20 actors will meet this criteria.

Biography Filter

The new biography contains only one display band, the navigation band for the filter tracking state.  It indicates the dates in which the actor’s state characteristics were captured by the model (the axis at the bottom of the screen indicates the start and end dates).  The display band does not have to be continuous.  On the top of this display band the number of actors which satisfied the filter criteria is displayed as well as which actor is presently being shown on the screen.  In the example below, the first actor out of the twenty which were filtered is being displayed. The section State Selection and Navigation explains how more states can be added to the biography.

Sample biography display screen

Saving a biography: The File / Save and File / Save As... commands

Once all desired states have been added (formatted for style and colour), you may optionally save the biography to file. These files have extension .bbr and may be retrieved at a later time against a compatible data base.  For compatibility, the filter query must be non-empty and all previously selected actor/state pairs must exist in the open database. All style, colour and navigation positions are saved.

Dialog box to save a biography file

If you have altered the state data within a biography, the window caption will display an asterisk (*) after the file name until you save the biography.  This visual asterisk cue is not set, however, by navigation or by changes to global biography options.

Date modified:

Microsimulation approaches

PDF Version (PDF, 62.63 KB)

Introduction

This document provides a breakdown of contrasting microsimulation approaches that come into play when we simulate societies with a computer. These approaches can in turn be broken down into approaches of purpose, scope, and methods on how populations are simulated.

With respect to purpose, we mainly distinguish between prediction and explanation, which turns out to also be the distinction in purpose between data-driven empirical microsimulation on one hand and agent based simulation on the other. The prediction approach is further subdivided into a discussion of projections versus forecasts.

There are two aspects covered on the scope of a simulation – we first distinguish general models from specialized ones, then population models from cohort models.

Finally, looking at the methods on how we simulate populations, we focus our discussion in three ways. The first is the population type we simulate, thereby enabling us to distinguish both open versus closed population models, as well as cross-sectional versus synthetic starting populations. The second is the time framework used, either discrete or continuous. The third is the order in which lives are simulated, leading to either a case-based or time-based model.

Purposes of microsimulation: explanation versus prediction

Modeling is abstraction, a reduction of complexity by isolating the driving forces of studied phenomena. The quest to find a formula for human behaviour, especially in economics, is so strong that over-simplified assumptions are an often accepted price for the beauty or elegance of models. The notion that beauty lies in simplicity is even found in some agent based models. Epstein draws an especially appealing analogy between agent based simulation and the paintings of French impressionists, one of these paintings (a street scene) being displayed on the cover of 'Generative Social Sciences' (Epstein 2006). Individuals in all their diversity are only sketched by some dots, but looking from some distance, we are able to clearly recognize the scene.

Can statistical and accounting models compete in beauty with the emergence of social phenomena from a set of simple rules? Hardly--they are complex in nature and require multitudes of parameters. While statisticians might still find elegance in regression functions, beauty is hard to maintain when it comes to filing tax returns or claiming pension benefits. Accounting is boring for most of us, and models based on a multitude of statistical equations and accounting rules can quickly become difficult to understand. So how can microsimulation models compensate for their lack in beauty? The answer is simple: usefulness. In essence, a microsimulation model is useful if it has predictive or explanatory power.

In agent based simulation, explanation means generating social phenomena from the bottom up, the generative standard of explanation being epitomized in the slogan: If you didn't grow it, you didn't explain it (which is regarded as a necessary but not sufficient condition for explanation). This slogan expresses the critique of the agent based community on the mainstream economics community, with the latter's focus on equilibriums without paying too much attention to how or if those equilibriums can ever be reached in reality. Again, agent based models follow a bottom-up approach of generating a virtual society. Their starting points are theories of individual behaviour expressed in computer code. The spectrum of how behaviour is modeled thereby ranges from simple rules to a distributed artificial intelligence approach. In the latter case, the simulated actors are 'intelligent' agents. As such, they have receptors; they get input from the environment. They have cognitive abilities, beliefs and intentions. They have goals, develop strategies, and learn from both their own experiences and those of other agents. This type of simulation is currently almost exclusively done for explanatory purposes. The hope is that the phenomena emerging from the actions and interactions of the agents in the simulation have parallels in real societies. In this way, simulation supports the development of theory.

The contrast to explanation lies in detailed prediction, which constitutes the main purpose of data-driven microsimulation. If microsimulation is designed and used operatively for forecasting and policy recommendations, such models "need to be firmly based in an empirical reality and its relations should have been estimated from real data and carefully tested using well-established statistical and econometric methods. In this case the feasibility of an inference to a real world population or economic process is of great importance" (Klevmarken, 1997).

To predict the future state of a system, there is also a distinction to make between projections and forecasts. Projections are 'what if' predictions. Projections are always 'correct', based on the assumptions that are provided (as long as there are no programming errors). Forecasts are attempts to predict the most likely future, and since there can only be one actual future outcome, most forecasts therefore turn out to be false. With forecasts, we are not just simply trying to find out 'what happens if' (as is the case with projections); instead, we aim to determine the most plausible assumptions and scenarios, thus yielding the most plausible resulting forecast. (It should be noted, however, that implausible assumptions are not necessarily without value. Steady-state assumptions are examples of assumptions that are conceptually appealing and therefore very common but usually implausible. Under such assumptions, individuals are aged in an unchanging world with respect to the socioeconomic context such as economic growth and policies, and the individual behaviour is 'frozen' not allowing for cohort or period effects. Since a cross-section of today's population does not result from a steady-state world, the 'freezing' of individual behaviour and the socioeconomic context can help to isolate and study future dynamics and phenomena resulting from past changes, such as population momentum.)

How different is explanation from prediction? Why can't we rephrase the previous slogan to: If you didn't predict it, you didn't explain it ? First, being able to produce good predictions does not necessarily imply a full understanding of the operations underlying the studied processes. We don't need a full theoretical understanding to predict that lightning is followed by thunder or that fertility is higher in certain life course situations than in others. Predictions can be fully based on observed regularities and trends. In fact, theory is often sacrificed in favour of a highly detailed model that offers a good fit to the data. This, of course, is not without danger. If behaviours are not modeled explicitly, then neither are the corresponding assumptions, which can make the models difficult to understand. We can end up with black-box models. On the other hand, agent based models, while capable of 'growing' some social phenomena, do so in a very stylized way. So far, these models have not reached any sufficient predictive power. In the data-driven microsimulation community, agent based models are thus often regarded as toy models.

Whatever the reason for developing a microsimulation model, however, modellers will typically experience one positive side effect from the exercise: the clarification of concepts. By modeling behaviour, a level of precision (eventually transferred into computer code) is required that is not always found in social science which has an abundance of pure descriptive theory. It is safe to say that the process of modeling itself generates new insights into the processes being modeled (e.g. Burch 1999). While some of these benefits can be experienced in all statistical modeling, simulation adds to the potential. By running a simulation model, we always gain insights into both the reality we are trying to simulate plus the operation of our models and the consequences of our modeling assumptions. In this sense, microsimulation models are always explorative tools, whether their main purpose is explanation or prediction. Or to put it differently, microsimulation models provide experimental platforms for societies where the possibility of genuine natural experiments is limited by nature.

General versus specialized models

The development of larger-scale microsimulation models typically requires a considerable initial investment. This is especially true for policy simulations. Even if only interested in the simulation of one specific policy, we have to create a population and model the demographic changes before we can add the economic behaviour and accounting routines necessary for our study. This can create a situation where it becomes more logical to design microsimulation models as 'general purpose' models, thereby attracting potential investors from various fields. A model capable of detailed pension projections might easily be extended to other tax benefit fields. A model including family structures might be extended to simulate informal care. A struggle for survival can even lead to rather exotic applications–for example, one of the largest models, the US CORSIM model, survived difficult financial times by receiving a grant from a dentist's association interested in a projection of future demands for teeth prosthesis!

It is not surprising, therefore, that there is a general tendency to plan and develop microsimulation applications as general, multi-purpose models right from the beginning. In fact, large general models currently exist for many countries, as shown in the following table.

Country table
Country Models
Australia: APPSIM, DYNAMOD
Canada: DYNACAN, LifePaths
France: DESTINIE
Norway: MOSART
Sweden: SESIM, SVERIGE
UK: SAGEMOD
USA: CORSIM

In creating general models, both the control of ambitions and modularity in the design are crucial for success. Only a few of today's large models have actually reached and stayed at their initially planned sizes. Overambitious approaches have had to be corrected by considerable simplifications, as was the case with DYNAMOD which was initially planned as an integrated micro – macro model.

Specialized microsimulation models concentrate on a few specific behaviours and/or population segments. An example is the NCCSU Long-term Care Model (Hancock et al., 2006). This model simulates the incomes and assets of future cohorts of older people and their ability to contribute towards home care fees. It thereby concentrates on the simulation of the means test of long-term care policies, with the results fed into a macro model of future demands and costs.

Historically, it has also been the case that some models which started off as rather specialized models ended up growing to more general ones. This happened with SESIM and LifePaths, both initially developed for the study of student loans. LifePaths is a particularly interesting example as it not only grew to a large general model but also constituted the base, in a stripped-off version, of a separate family of specialized health models (Statistics Canada's Pohem models).

Cohort versus population models

Cohort models are specialized models, as opposed to general ones, since they only simulate one population segment, namely one birth cohort. This is a useful simplification if we are only interested in studying one cohort or comparing two distinct cohorts.

Economic single cohort studies typically investigate lifetime income and the redistributive effects of tax benefit systems over the life course. Examples of this kind of model include the HARDING and LIFEMOD models developed in parallel, the former for Australia, and the latter for Great Britain (Falkingham and Harding 1996). This kind of model typically assumes a steady-state world, i.e., the HARDING cohort is born in 1960 and lives in a world that looks like Australia in 1986.

Population models deal with the entire population and not just specific cohorts. Not surprisingly, several limitations of cohort models are removed when simulating the whole population, including demographic change issues and distributional issues between cohorts (like intergenerational fairness).

Open versus closed population models

On a global scale, the human population is a closed one. Everybody has been born and will eventually die on this planet, has biological parents born on this planet, and interacts with other humans all sharing these same traits. But, when focusing on the population of a specific region or country, this no longer holds true. People migrate between regions, form partnerships with persons originating in other regions, etc. In such cases, we are dealing with open populations. Therefore, in a simulation model in which we are almost never interested in modeling the whole world population, how can we deal with this problem?

The solution usually requires a degree of creativity. For example, when allowing immigration, we will always have the problem to find ways to model a specific country without modeling the rest of the world. With respect to immigration, many approaches have been adopted, ranging from the cloning of existing 'recent immigrants' to sampling from a host population or even from different 'pools' of host populations representing different regions.

Conceptually more demanding is the simulation of partner matching. In microsimulation, the terms closed and open population usually correspond to whether the matching of spouses is restricted to persons within the population (closed) or whether spouses are 'created on demand' (open). When modeling a closed population, we have the problem that we usually simulate only a sample of a population and not the whole population of a country. If our sample is too small, it becomes unlikely that reasonable matches can be found within the simulated sample. This holds especially true if geography is also an important factor in our model. For example, if there are not many individuals representing the population of a small town, then very few of them will find a partner match within a realistic distance.

The main advantages of closed models are that they allow kinship networks to be tracked and that they enforce more consistency (assuming that they have a large enough population to find appropriate matches). Major drawbacks of closed models, however, are the sampling problems and computational demands associated with partner matching. In a starting population derived from a sample, the model may not be balanced with respect to kinship linkages other than spouses, since a person's parents and siblings are not included in the base population if they do not live in the same household (Toder et al. 2000).

The modeling of open populations requires some abstraction. Here, partners are created on demand - with characteristics synthetically generated or sampled from a host population – and are treated more as attributes of a 'dominant' individual than as 'full' individuals. While their life courses (or some aspects of interest for the simulation of the dominant individual) are simulated, they themselves are not accounted for as individuals in aggregated output.

Cross-sectional versus synthetic starting populations

Every microsimulation model has to start somewhere in time, thus creating the need for a starting population. In population models, we can distinguish two main starting population types: cross-sectional and synthetic. In the first case, we read in a starting population from a cross-sectional dataset and then age all individuals from this moment until death (while of course also adding new individuals at birth events). In the second case we follow an approach typically also found in cohort models–all individuals are modeled from their moment of birth onwards.

If we are only interested in the future, why would we want to start with a synthetic population that would also force us to simulate the past? Certainly, starting from a cross-sectional dataset can be simpler. When we start from representative 'real data', we therefore do not have to retrospectively generate a population, implying that we do not need historical data to model past behaviour. Nor do we have to concern ourselves with consistency problems, since simulations starting with synthetic populations typically lack full cross-sectional consistency.

Unfortunately, many microsimulation applications do need at least some biographical information not available in cross-sectional datasets. For example, past employment and contribution histories determine future pensions. As a consequence, some retrospective or historical modeling will typically be required in most microsimulation applications.

One idea to avoid a synthetic starting population when historical simulation is in fact needed could be to start from an old survey. This idea was followed in the CORSIM model which used a starting population from a 1960 survey (which also makes this model an interesting subject of study itself). While the ensuing possibility to create retrospective forecasts can help assess the model's quality against reality, such an approach nevertheless has its own problems. CORSIM makes heavy use of alignment techniques to recalibrate its retrospective forecasts to published data. Even if many group and aggregate outcomes can be exactly aligned to recent data, there is no way of assuring that the joint distributions based on the 1960 data remain accurate after several decades.

When creating a synthetic starting population, everything is imputed. We thus need models of individual behaviour going back a full century. While such an approach is demanding, it has its advantages. First, the population size is not limited by a survey; we are able to create larger populations, thus diminishing Monte Carlo variability. Second, being created synthetically, we omit confidentiality conflicts. (Statistics Canada follows this approach in its LifePaths model.) Overall, the more that past information has to be imputed, or the more crucial that past information is for what the application is attempting to predict or explain, then the more the approach of a synthetic starting population becomes attractive. For example, Wachter (Wachter 1995) simulated the kinship patterns of the US population following a synthetic starting population approach that went back to the early 19th century. Such detailed kinship information is not found in any survey and thus can be constructed only by means of microsimulation.

Continuous versus discrete time

Models can be distinguished by their time framework which can be either continuous or discrete. Continuous time is usually associated with statistical models of durations to an event, following a competing risk approach. Beginning at a fixed starting point, a random process generates the durations to all considered events, with the event occurring closest to the starting point being the one that is executed while all others are censored. The whole procedure is then repeated at this new starting point in time, and this cycle keep on occurring until the 'death' event of the simulated individual takes place.

Figure 1 illustrates the evolution of a simulated life course in a continuous time model. At the beginning, there are three events (E1, E2, E3), each of which has a randomly generated duration. In the example, E1 occurs first so it becomes the event that is executed; after that, durations for the three events are 're-determined'. However, because E3 is not defined to be contingent on E1 in the example, its duration remains unchanged, whereas new durations are re-generated for E1 and E2. E3 ends up having the next smallest duration so it is executed next.  The cycle then continues as durations are again re-generated for all three events.

Figure 1: Evolution of a simulated life course

Continuous time models are technically very convenient, as they allow new processes to be added without changing the models of the existing processes as long as the statistical requirements for competing risk models are met (See Galler 1997 for a description of associated problems).

Modeling in continuous time, however, does not automatically imply that there are no discrete time (clock) events. Discrete time events can occur when time-dependent covariates are introduced, such as periodically updated economic indices (e.g. unemployment) or flow variables (e.g. personal income). The periodic update of indices then censors all other processes at every periodic time step. If the interruption periods are so short (e.g. one day) that the maximum number of other events within a period virtually becomes one, such a model has converged towards a discrete time model.

Discrete time models determine the states and transitions for every time period while disregarding the exact points of time within the interval. Events are assumed to happen just once in a time period. As several events can take place within one discrete time period, either short periods have to be used to avoid the occurrence of multiple events or else all possible combinations of single events have to be modeled as events themselves. Discrete time frameworks are used in most dynamic tax benefit models, with the older models usually using a yearly time framework mainly due to computational restrictions. With computer power becoming stronger and cheaper over time, however, shorter time steps can be expected to become predominant in future models. When time steps become so short that we can virtually exclude the possibility of multiple events, we have reached 'pseudo-continuity'. In this case we can even use statistical duration models. An example of the combination of both approaches is the Australian DYNAMOD model.

Case-based versus time-based models

The distinction between case-based and time-based models lies in the order in which individual lives are simulated. In case-based models one case is simulated from birth to death before the simulation of the next case begins. Cases can be individual persons or a person plus all 'non-dominant' persons that have been created on demand for this person. In the latter situation, all lives pertaining to a given case are simulated simultaneously over time.

Case-based modeling is only possible if there is no interaction between cases. Interactions are limited to the persons belonging to a case, thereby imposing significant restrictions on what can be modeled. The advantage of such models is of a technical nature--because each case is simulated independently of the others, it is easier to distribute the overall simulation job to several computers. Furthermore, memory can be freed after each case has been simulated, since the underlying information does not have to be stored for future use. (Case-based models can also only be used with open population models, not closed ones).

In time-base models, all individuals are simulated simultaneously over a pre-defined time period. Because all individuals are aging simultaneously (as opposed to just the individuals in one case), the computational demands definitely increase. In a continuous time framework, the next event that happens is the first event scheduled within the entire population. Thus, computer power can still be a bottleneck for this kind of simulation – current models in use typically have population sizes of less than one million.

Date modified: