Attention-deficit/hyperactivity disorder (ADHD), characterized by hyperactivity, impulsiveness and deficient sustained attention, is one of the most common and persistent behavioral disorders of childhood. ADHD is associated with catecholamine dysfunction. The catecholamines are important for response selection and memory formation, and dopamine in particular is important for reinforcement of successful behavior. The convergence of dopaminergic mesolimbic and glutamatergic corticostriatal synapses upon individual neostriatal neurons provides a favorable substrate for a three-factor synaptic modification rule underlying acquisition of associations between stimuli in a particular context, responses, and reinforcers. The change in associative strength as a function of delay between key stimuli or responses, and reinforcement, is known as the delay of reinforcement gradient. The gradient is altered by vicissitudes of attention, intrusions of irrelevant events, lapses of memory, and fluctuations in dopamine function. Theoretical and experimental analyses of these moderating factors will help to determine just how reinforcement processes are altered in ADHD. Such analyses can only help to improve treatment strategies for ADHD.
Attention-deficit/hyperactivity disorder (ADHD) is one of the most common and persistent behavioral disorders of childhood, consisting of developmentally inappropriate, persistent, and impairing levels of hyperactivity, impulsiveness and inattention . The prevalence of the disorder is similar in different cultures [2-4], about 5% of school-aged children  and 4% of adults  are affected worldwide. The disorder places the child at increased risk of school failure, juvenile delinquency, criminality, substance abuse, and HIV/AIDS as a consequence of sexual promiscuity and disregard for preventative measures [7-9]. For these reasons, the disorder is extremely costly to the afflicted individuals, their families, and to their society [10,11]. Despite being one of the most intensively studied psychiatric disorders, its etiology, diagnosis, and optimal treatment strategies are still the subject of debate and controversy.
Genetic factors have been identified , probably producing alterations in catecholaminergic regulation of brain function in frontosubcortical pathways [13,14]. At a behavioral level, children with ADHD respond atypically to reinforcers whether they are tangible rewards or social praise; they are less able to delay gratification and often fail to respond to discipline [15-21]. Compared to typically-developing peers, they perform less well under schedules of partial reinforcement [22,23]. Children with ADHD also respond more impulsively during delayed reinforcement in that they are more likely than typically-developing peers to choose small immediate reinforcers over larger delayed reinforcers [16,24-27]. Atypical response to reinforcement is a pervasive and fundamental characteristic of ADHD, which has important implications both for understanding the brain mechanisms underlying the disorder and for the development of effective behavioral and pharmacological interventions.
There have been many attempts to explain the origins of ADHD symptoms. A dual-process theory [17,28-31] suggests that less efficient reinforcement processes may explain several of the characteristic behavioral patterns. The temporal relationship between stimulus, response, and reinforcer strongly influences the effectiveness of reinforcers. For reinforcement to alter behavior, events need to occur within a limited time frame, but the extent of this time frame also depends on attentional and memorial variables. This is important both in basic laboratory research, where it is often overlooked, and in analysis of ADHD, which is associated with poor attention and memory [32,33].
This paper will first briefly describe the role of the catecholamines in response selection and memory formation before reviewing the neurobiological bases of reinforcement in general and discriminated response learning (i.e. stimulus-response-reinforcement learning) in particular. We then explore the delay-of-reinforcement gradient which describes the temporal window for the association of predictive cues with behavior and its consequences. Alterations in the shape of the delay gradient may be directly linked to dopamine dysfunction, but may also be secondary to changes in attention and memory processes. Further, we briefly describe how the core symptoms of ADHD can be explained by a steepened delay-of-reinforcement gradient. Finally, based on operant theory and empirical findings, we describe behavioral procedures for minimizing effects of a steepened gradient, and discuss challenges for reinforcement theories of ADHD.
The role of catecholamines in response selection and memory formation
Behavior is guided by neural representations of previous experience. These memories are encoded in neural networks that represent the different elements of perception, motor response, and their consequences as well as the associated cues that predict the outcome. Increased synaptic efficacy, long-term potentiation (LTP), is commonly regarded as a prime candidate for mediating learning and memory [34,35]. Hebb proposed that connections between two neurons are strengthened when one neuron repeatedly or persistently takes part in firing the other neuron (presynaptic and postsynaptic activity being the two factors) , a proposition now known as Hebb's rule. The catecholamines dopamine and norepinephrine are required for selection and strengthening of responses that produce the reinforcer (reward). They also play an essential role in working memory (immediate, which lasts for seconds), short-term (lasts seconds to minutes) and long-term memory (lasts hours to years).
The noradrenergic system is part of a coordinated structure that promotes behavioral adaptation to novel environments . Noradrenergic neurons fire phasically in response to novel stimuli as well as to changes in environmental contingencies [37,38]. The norepinephrine projection to the prefrontal cortex is engaged by novel action-outcome contingencies, compatible with a role in mechanisms of plasticity and new learning . At a cellular level, norepinephrine strengthens synaptic connections in neural circuits and thereby reinforces attention and memory .
Dopamine is essential for both LTP and long-term depression (LTD) in brain areas that are critically involved in learning . Dopamine activation of D1 receptors mediates reinforcement of behavior by strengthening synaptic connections between neurons (LTP) or weakening synaptic connections (LTD) in neural circuits that involve the prefrontal cortex and/or striatum (cortico-cortical and/or cortico-striato-thalamo-cortical circuits) . Dopamine modulation of LTP is probably the neurobiological basis of reinforcement of behavior, whereas dopamine-induced LTD may be the mechanism that underlies extinction processes [17,43].
Response selection is sensitive to contextual factors. Input from the hippocampus gates the prefrontal cortex input and facilitates behavioral output based on the current context of the situation (time and place) or past experiences with the stimulus . Dopamine projections to prefrontal cortex, hippocampus, and amygdala directly influence transmission in neural networks that involve these structures. Dopamine is further involved in memory processes by modulating neurons in the prefrontal cortex that are active during the delay interval between a stimulus presentation and a response , and may regulate working memory and attention [13,45].
Neurobiology of reinforcement
Most investigators agree that mesolimbic and mesostriatal dopamine systems contribute to the psychological functions of reward – (incentive) and reinforcement-related learning, strengthening or increasing the probability of future occurrences of the behavior that preceded the reinforcer . However, the exact role of these dopamine systems has been controversial.
Several associative processes occur during learning on the basis of positive reinforcement. These include classical conditioning (stimulus-stimulus association), habit formation (stimulus-action association), and learning of action-outcome contingencies. These processes are associated with activity in specific brain regions and can be shown to be selectively impaired by damage to those regions . Here, we focus on discriminated response learning (learning that a response may be followed by a reinforcer only in the presence of a particular stimulus) which involves all of these processes.
A considerable body of evidence from single neuron recordings in monkeys indicates that dopamine cells fire phasically in response to unpredicted primary and secondary reinforcers [48,49]. Dopamine release brought about by phasic activity of dopamine neurons appears to be necessary for learning on the basis of positive reinforcement [50,51]. In particular, the majority of dopamine neurons show phasic activation after unpredicted primary reinforcers and conditioned, reinforcer-predicting stimuli (conditioned reinforcers), but not to aversive events which inhibit dopamine cell firing [48,52]. In these experiments, bursts of action potentials occur initially in response to a liquid reinforcer, then to solenoid clicks that precede delivery of the reinforcer. These clicks and other sensory cues associated with the primary reinforcer become secondary reinforcers. After training, the dopamine neurons fire at the occurrence of the earliest cue that predicts the reinforcer .
Rodent, primate, and human studies provide evidence that the striatum plays a key role in learning based on positive reinforcement . In rats, lesions of the dorsal striatum impair acquisition of tasks requiring discriminated response learning. Behavioral measures in humans with neurodegenerative diseases of the striatum also provide evidence of its role in discriminated response learning .
Anatomically, the neostriatum is in a unique position to integrate the three factors of stimulus, response and reinforcement. The striatum receives input from nearly all areas of the cerebral neocortex in a topographical fashion . The inputs from the neocortex make direct synaptic contact with the spiny neurons of the neostriatum. These neurons, in turn, project back to the neocortex via the thalamus. Hence, the nature of the corticostriatal inputs and the input-output relationship with the spiny projection neurons are the crucial determinants of striatal output. The neostriatum also receives input from the dopaminergic neurons. As noted above, the dopamine-producing neurons of the pars compacta of the substantia nigra display short periods of increased activity after the unpredicted presentation of food or liquid reinforcers and are believed to be involved in acquisition of behavior based on reinforcement. Nigral dopaminergic neurons project predominantly to the neostriatum where they converge with the inputs from the neocortex, amygdala and hippocampus. The convergence of dopaminergic and corticostriatal synapses upon individual neostriatal neurons provides a favorable substrate for a three-factor synaptic modification rule because it brings together the processes of three groups of cells (neocortical, neostriatal, and dopaminergic neurons). The three-factor synaptic modification rule was proposed as a cellular reinforcement mechanism for discriminated response learning, in which the situation is represented by the neocortical state (presynaptic activity), responses are represented by neostriatal neural activity (postsynaptic activity), and dopaminergic neurons encode reinforcing events (the third factor). The conjunction of these three factors was proposed to underlie learning by strengthening the synapses connecting the cerebral cortex to the striatum [55-57]. A conjunction of neocortical and striatal activity in the absence of the reinforcing dopamine signal was proposed to underlie extinction by weakening the active synapses .
Demonstration of the operation of a three-factor rule for synaptic modification in the dorsal part of the neostriatum was first reported on the basis of experiments in brain slices . The three-factor hypothesis was tested by ejecting small pulses of dopamine to coincide with the conjunction of neocortical and neostriatal activity. Pulsatile ejection of dopamine, mimicking the effects of phasic reinforcer-related firing of dopamine cells, caused LTP of neocortical inputs. In the absence of pulsatile dopamine, LTD was induced. Thus, pulsatile dopamine stimulation activated a molecular switch that converted LTD into LTP . More recently, it has been shown that the timing of back-propagating postsynaptic action potentials relative to arriving corticostriatal excitatory inputs determines whether LTP or LTD takes place and that dopamine receptor activation is required for both LTD and LTP induction .
The functional significance of this three-factor rule is that striatal projection neurons effectively encode the integrated history of reinforcement of actions performed in specific situations. The effectiveness of their synaptic inputs from the neocortex translates the current cortical activity pattern into a value representing the probability of reinforcement. This is because each instance of reinforcement produces an incremental increase in the effectiveness of the contributing synapses, so their effectiveness comes to represent the integrated history of reinforcement over time. Since these are excitatory inputs, their effectiveness is translated into depolarization of the postsynaptic neurons, the activity of which provides a readout of the expectation of reinforcement in the particular context of cortical activity [60,61].
The three-factor rule, in the context of the corticostriatal pathway, provides a plausible cellular mechanism for selecting responses that have been reinforced in the past. The three-factor rule operates within a limited temporal window. Reinforcer-related release of dopamine must coincide with synaptic activity representing behavior and the situation within a short (subsecond) time interval for LTP to take place . In the following section, we examine the temporal constraints of the three-factor rule from a behavioral perspective. The delay-of-reinforcement gradient is a central concept in operant theory, elaborating the three-factor rule along a temporal dimension. In the following section, the origin of the delay-of-reinforcement gradient, its direction, and its relation to attention and memory processes are discussed. Knowledge of the various components and processes feeding into the reinforcement and response selection processes is important when investigating how reinforcers act.
Delay of reinforcement – decay gradients
A reinforcer is not defined in terms of previous events; it is defined in terms of what happens next – by the behavioral changes that follow reinforcement. Reinforcers act on responses in the same class as those that preceded their presentation, within a limited time frame (seconds) from the occurrence of the behavior to the perception of its consequences. This is the case for humans along with other animals; but for humans that brief window may be enormously expanded by verbally formulated prompts or rules, such as Last time when I did my homework, I got a smile from the teacher; I want that again. In turn, the utility of such prompts depends on the ability of the individual to keep them active in mind and use them to guide behavior. Delays between action and outcome impair conditioning : The strength of a discriminated response association is inversely related to the delay between the response and the reinforcer [64-66]. This does not mean that strength dissipates over delay; reinforcers strengthen the response state that the animal is in; effects of delay arise because the longer the time between the response and the reinforcer, the more likely it is that the animal has left the behavioral state it was in when it responded, so some other state, i.e. some other neuronal activity, will be erroneously reinforced . The weakening of conditioning with delay is a credit allocation problem; precise allocation becomes more difficult with delay.
The origin of the gradient
Reinforcement strengthens association between a stimulus and response, context and consequence. When in the context, presentation of a discriminative stimulus prepares the organism for reinforcement, and may cause the rate of responding to increase, consistent with Skinner's definition of reinforcement . The process may also occur through classical conditioning (Pavlovian S-S association), causing the context, or predictive stimuli within it, to become associated with reinforcement, and thus to become attractive to the organism. There is a problem however: A stimulus or response which occurs some time before a reinforcer cannot contemporaneously be joined with the reinforcer. The stimulus or response must leave some kind of trace that is present at the time of reinforcement. This trace may be conceptualized as a memory, a representation, a synaptic flow, or a reverberating circuit. How are such representations of appropriate stimuli and responses formed? The prior section proposes some neurobiological mechanisms; here we consider parallel behavioral mechanisms.
Consider the string of events experienced by the organism, leading up the current position in time (Figure 1). Several events occur and then a signal event – a reinforcer – occurs. How is credit for the reinforcer allocated to precedent events? It is clear that immediately before the reinforcer there are only one or a few candidate events, and that their number grows exponentially as we back away. Not only must the multiple candidates be evaluated, but the complete path between them and the reinforcer must be considered. Even in an impoverished world with only two states, as shown in Figure 1, the number of candidate histories grows as 2n, where n is the number of steps back in time that we consider. It follows that the attention that the organism pays to each path – the credit assigned to each – must decay geometrically with the time to the reinforcer.
Reinforcers do not act backwards in time, but on memory representations of preceding causes. Candidate causes proliferate as an effect is delayed. Absent special clues, this leads to a geometric decrease in credit that is available to be allocated to...
Consider the rudimentary case where a series of red and green lights are flashed, too quickly to count, and the subject is required to judge whether the sequence was predominantly red or green . Here the delay between each element is subject to the decay equation (see Appendix 1), because interposed between it and the end of the trial (and possible reinforcement) lies a series of other flashes which cannot be ignored. We may measure the importance of each element in the sequence by plotting how often the response (judgment) corresponded to the color of that element. Of course, sometimes the judgment was correct, sometimes incorrect, depending on the context of other colors. But averaged over these randomized presentations, it tells us how much credit was given to each element (Figure 2).
The decay in the influence of a stimulus as a function of the number of items intervening between it and reinforcement . The ordinates are the probability that a summary response that indicated the preponderant color (e.g., "mostly red") was the same...
The influence of events decreased rapidly with each interposition, approaching asymptote after 6 items. Note that an almost six-fold increase in the time between stimuli had no affect on the rate of memory decay: Events, not time, caused most of the forgetting in this study. Similar results were reported by Waugh and Norman . It is obvious that time is often not the proper independent variable in the study of memory decay; it is the number of events processed per unit of time that matters. Since in the flow of the real world the number of events is often inscrutable, however, time is often taken as its proxy. Outside the laboratory, experience and time are intrinsically correlated, so the common assumption that memory, and its neural substrates, decay over time, is true. It is also pragmatic, because, unless they are carefully manipulated by an experimenter, the stimuli and responses that fill the delay interval are more difficult to measure than the interval's temporal extent.But it is events in time, not time per se, that function as causes. The steeper delay gradients that often characterize hyperactive organisms may be due to the greater number of events they expose themselves to during delays.
The direction of the delay-of-reinforcement gradient
We may plot the decrease in associative strength at temporal removes from reinforcement as in Figure 2. This is the classic delay of reinforcement gradient. It is misleading, however, if construed as mechanism. It is the ability to hold the response (and the associated stimuli) in memory that decays over time; reinforcers act on the decreasing tails of these memorial traces. Because those gradients decrease with time, they start at a near-maximal value at the time of the stimulus or response, and decay until the moment of reinforcement (the mirror image of the traditional representation). Gradients such as those shown in Figure 2 are a summary report of these processes, not the processes themselves.
The classic view is useful in the case of establishing new behavior with delayed reinforcers [17,29,71]; but it can be misleading when applied to the delay-of-reinforcement experiments so often utilized in the study of impulsivity. The underlying hypothetical gradients are often viewed as the strength of the pull toward the large-late or small-soon reinforcer, with the choice of the latter called impulsive, and explained in terms of steeper discount gradients [72,73]. If you judge a bird in the hand to be worth two in the bush, you are prudent; but if you think it worth four in the bush, you are impulsive. But the temporal discounting involved in such choice may have little to do with the steepness of trace gradients. In experiments with humans, the outcomes are presented verbally [64,74,75], and the obtained preferences and discount gradients are strongly influenced by the individual's ability to imagine these future situations, and relate them to his current desires. It is not so much a future event that is discounted, as the future self who will enjoy it.
Delay gradients, working memory, and attention
As Henri Bergson  noted, perception and consciousness did not evolve to provide entertainment, but to prepare us for action; that action is shaped by reinforcement. Reinforcement cannot act backward in time, but only on the palette of events carried to it by memory. Each new event crowds in to overshadow the traces of older memories. It overwrites them, to be overwritten in turn, and again with each new step through time and the events that time carries. Rich environments present the potential for a disastrously quick loss of ability to allocate credit to the correct precedent. For how long do you typically retain the name of a newly introduced person, when this is accompanied by their novel appearance and personal details, in a general context of other novelties?
Memory is a key player in these analyses, and the variety of memory that is most relevant is working memory. Working memory capacity characterizes the ability to hold and utilize elements of information in a computation or action after they are briefly presented, with key elements under threat of displacement by the manipulation of them or by other events. Think of doing multiplication problems in one's head, or remembering a phone number while engaging in the ongoing conversation. This ability to hold, or retrieve, a representation, may underlie our ability to learn through reinforcement. Reduced memory capacity would functionally steepen the delay-of-reinforcement gradient, because fewer of the behaviors and events preceding the reinforcer are represented in memory at the time of reinforcement.
It may seem that simple conditioning should not require such representation. This may be the case for delay conditioning, where the stimulus overlaps with the reinforcer . Awareness, however, is necessary for trace conditioning, where the stimulus is episodic and must be remembered during the delay [78,79]. Trace conditioning engages additional areas of the neocortex than the simpler delay conditioning, in particular areas that maintain working memory processes .
Attention is related both to reinforcement and to working memory. The strength of the memory trace at the time of reinforcement depends on the attention originally allocated, the number of competing states, and the relative salience of each of the ensemble of traces. In an unfamiliar situation, attention is captured by novel, salient stimuli in a bottom-up, memory-free way (yesterday's orienting response, today's automatic capture of attention) [81-83]. As the process of reinforcement unfolds, predictive stimuli or responses are recognized and become established as discriminative stimuli: Relevant behaviors will be performed in the presence of the discriminative stimulus, while other behaviors will be avoided. In these situations, specific stimulus properties are actively attended to because they signal favorable consequences [82,83]. When a stimulus has acquired discriminative properties, attention is guided in a controlled, memory-dependent way, as a learned behavior shaped and maintained by reinforcement [82,83]. The consequences of attending thus change what is attended to [84,85]. A familiar example is the Wisconsin card sorting test where positive consequences are arranged for attending to one of the three dimensions on the stimulus-cards (number, shape, or color) . The consequences change which dimension the testee attends to and sorts by. Rats attend to the light signaling which of two response alternatives will produce a reinforcer; people attend to the wheels on a slot-machine because they signal when money is won; researchers attend to which of the granting agencies is in political favor because that shapes the flavor of the application.
Reinforcement processes in ADHD
Forty years ago Wender suggested that reinforcers work differently in ADHD ; a fact known implicitly by parents long before that landmark book. Numerous studies have investigated effects of reinforcers in ADHD, and although the findings are not entirely consistent, reinforcers seem to affect behavior differently in ADHD than in control subjects (see  for a review). Rapid advances in neurobiology and genetics have produced compelling evidence for deficits in catecholamine functions in ADHD [13,14,17]. These findings, combined with research showing the importance of the catecholamines in memory and response selection processes [39,40,45,49-51], and especially of dopamine in behavioral acquisition based on reinforcement , support the early suggestion of a reinforcement deficit in ADHD [18-23,26,28,29,71].
Reinforcement and extinction processes are the fundamental mechanisms of behavioral selection . This process is in many ways similar to selection in genetics: "Within the lifetime of the individual organism, behavior is selected by its consequences, much as organisms are selected over generations by evolutionary contingencies" . To survive, organisms must generate novel behavior with yet unforeseen consequences and be able to profit from experience by increasing the frequency of successful responses and eliminating unsuccessful or maladaptive behavior. Reinforcement will strengthen preceding behavior regardless of whether the behavior is correct or incorrect . A reinforcer presented after four incorrect responses followed by a correct response will strengthen both the incorrect responses as well as the correct response. However, because reinforcers are presented contingent on successful (correct) and not on unsuccessful (incorrect) responses, only correct responses will consistently precede reinforcers. Hence, in the long run, correct responses will be strengthened substantially more than the other responses. Mechanisms of behavioral selection must be sensitive to contextual factors; adaptive behavior in one context may not be adaptive in another. Habits, skills, and beliefs are sedulously built from simple behavioral units to longer behavioral sequences that come under the control of environmental stimuli (stimulus control) as reinforcers are delivered in some situations and not in others [17,91-93].
Human behavior is sometimes controlled, not by reinforcement contingencies, but by verbally formulated rules about the reinforcement contingencies and what the person believes is the correct/incorrect behavior. In these cases, the rules (Bacon's Idols of the Marketplace) may prevent behavior from making contact with the real contingencies of reinforcement.
A steepened delay gradient in ADHD – symptoms
The symptoms observed in ADHD have been explained as an executive dysfunction [33,94,95], as a non-optimal mental energy state [96,97], as delay aversion linked to motivational deficits, and as a cognitive inhibitory deficit [26,95,98]. Above, we have described how the three-factor rule of reinforcement relevant at the cellular level can be translated into the delay-of-reinforcement gradient operating at a behavioral level. The delay-of-reinforcement gradient provides a way to describe how reinforcement processes are altered in ADHD. Changes in these behavioral selection mechanisms will inevitably produce behavioral changes. A steepened delay-of-reinforcement gradient can make sense of many of the behavioral symptoms associated with ADHD.
Several hypotheses and theories have been proposed on how reinforcement processes are altered in ADHD relative to normally developing children [18-23,26,28,29,31,71]. The dynamic developmental theory of ADHD posits that dopamine hypofunction in ADHD narrows the time window for associating predictive stimuli with behavior and its consequences [17,29]. This narrowed time window entails a steepened delay-of-reinforcement gradient. However, as previously described, it is events in time and not time itself that drives the delay-of-reinforcement gradient. Also, as shown in previous sections, both attention deficits and more rapid memory decay may cause steepening of the delay-of-reinforcement gradient in ADHD . These perspectives represent an extension of the dynamic developmental theory.
Due to the association between dopamine and LTD, the theory also proposes that extinction processes are depressed in ADHD, causing a slowed or deficient elimination of previously reinforced behavior . Altered reinforcement learning described by a steepened delay-of-reinforcement gradient combined with deficient extinction can produce the main symptoms of ADHD: Inattention, hyperactivity, impulsivity, and additionally increased behavioral variability [17,29,92,93,100-102]. Slowed learning of discriminative stimuli due to the steepened delay-of-reinforcement gradient leads to a weaker control of behavior by contextual cues: Behavior is not controlled over extended periods of time by the discriminative stimulus and may be inappropriate for the current situation . This may be observed as symptoms of deficient sustained inattention (e.g. forgetful in daily activities; easily distracted; fails to finish schoolwork, chores, or duties in the workplace) .
Reinforcers also strengthen the temporal relation between consecutive responses or behavioral elements. A steepened delay-of-reinforcement gradient implies that mainly fast response sequences are reinforced. Hence, hyperactivity is suggested to be caused by the reinforcement of bursts of correct responses combined with deficient extinction of non-functional or incorrect behavior. Further, a steepened delay-of-reinforcement gradient signifies that delayed consequences of behavior have less effect in children with ADHD than in normal controls. Thus, poorer control of behavior and less effective learning would be expected with delayed reinforcement compared to that seen in individuals without ADHD. This prediction is consistent with the preference for immediate reinforcers reported in children with ADHD compared to normal controls [16,24-26,104].
The dynamic developmental theory of ADHD suggests that changes in fundamental behavioral selection mechanisms slow the association ("chunking") of simple response units into longer, more elaborate chains of adaptive behavioral elements that can function as higher-order response units [17,92,93,102]. When response units are chunked together into a chain, one response unit reliably precedes the next and there is a high degree of predictability within the response chain. Deficient or slowed chunking of behavior means that the reliable and predictable pattern of responses is absent, resulting in the increased intra-individual variability observed in ADHD [92,93,105,106].
The operant principles used to explain ADHD behavior in terms of a steepened delay-of-reinforcement gradient offers some suggestions on how to optimize learning in individuals with ADHD. These general suggestions are based on operant theory and empirical findings from studies of animals as well as humans. However, while these suggestions may be highly relevant for clinical interventions in ADHD, they are not specific, nor necessary all tested, in ADHD.
A steepened delay-of-reinforcement gradient hampers learning and may lie at the core of the behavioral changes seen in ADHD. Interventions aimed at making the delay-of-reinforcement gradient functionally shallower will improve learning and reduce ADHD symptoms. The gradients become functionally shallower – have greater ability to capture more remote events – if: (1) there is minimal post-event interference; (2) the event persists – stimuli bridging the delay in the case of stimulus events, repetitive responses in the case of response events – so that later or similar parts of the event are close to reinforcement; (3) the event is marked for special attention; and (4) it precedes other events which have themselves become conditioned reinforcers.
Post-event interference can be minimized by provision of a minimally disruptive context, or by the subject's ability to focus on a relevant subset of the environment. Retroactive interference is equally disruptive in human and non-human animals . It is demonstrated in Figure 2 by the similarity of the two forgetting functions on an event axis; on a real-time axis the condition with the brief time between stimuli (inter-stimulus interval, ISI) would appear to decay at about twice the rate of the long ISI condition. The difference is that nothing was happening during the longer ISIs to disrupt memory, and any decay there was occurring at a much slower rate. Individuals with deficits in ability to allocate attention, whether toward long-term goals or simply away from immediate temptations – will be especially subject to interference, and therefore evidence steeper delay-of-reinforcement gradients. A major clinical challenge is, of course, to increase the subject's focus on relevant stimuli and minimize the disruptive context. Enhancing the salience of stimuli by e.g. the use of colors (see below) may increase the focus on relevant environmental factors. Additionally, and consistent with established educational practice, breaking up tasks into small and manageable parts may reduce effects of disruptive context and lead to improved learning and performance.
2. Creating robust memory traces
In delay conditioning, a stimulus is continuously signaled during the delay to reinforcement. In trace conditioning, a stimulus is only briefly present, then removed during the delay. The former is many times more effective over moderate and long delays than the latter . Thus, from a practical perspective, providing cues or stimuli that are continuously present during the delay to reinforcement may reduce demand on memory and improve learning and performance. Some memorial tactics essentially turn trace conditioning into delay conditioning, thus bridging the temporal gap. The extent to which individuals can do this constitutes their working memory capacity. Repetition of the event, either until it is needed, can be used, can be written down, or seeps into long-term memory, helps keep the memory of the stimulus alive for association. We repeat new phone numbers until a pen is found. Prey animals often keep the stimulus alive by paradoxical "stalking" of a predator, keeping it in sight so that memory of its presence will not be over-written by foraging. Neonates may bridge the gap with repeated stereotyped movements appropriate to the conditioned stimulus . These tactics mark the trail through the labyrinth of Figure 1 from the initial event to the eventual reinforcer, just as seasoned hikers will periodically turn around to make a mental image of the trail they must choose to find their way back home.
3. Attentional loading
Novel events catch our attention. Such attention can be enhanced by remarking on the novelty, as we might repeat the name of the new acquaintance when first introduced. A subsequent reinforcer then has the highlighted memory on which to work. This tactic winnows the candidate paths by weighting a particular event more heavily than contemporaneous stimuli. Unpredicted events are noticed; unpredicted reinforcers capture attention and cause learning. Predicted events and reinforcers are not further associated, and fade from attention. This gamble of vesting attention is often successful, but is never without risk, since distraction from relevant stimuli will hamper learning. Attention is the gatekeeper that decides which events enter memory to be candidates for reinforcement. Changes in attention-processes will affect the shape of the delay-of-reinforcement gradient. Thus, attention deficits in ADHD may be the primary culprit behind many of the other symptoms. However, attention is itself a behavior that is modifiable by reinforcement [84,85]. Thus, in the dynamic system that is a developing human being, the cause-effect status of attention versus reinforcement and learning is a chicken-and-egg problem; deficits in either will cause problems for the other, and interventions that help one will improve the other.
A tactic that is sometimes useful in determining the cause of a reinforcer post-hoc is to reduce the number of candidate events. We will often "concentrate": Become quiet and focused in our attention. Then we replicate as best we can a subset of potential candidates. The car seems to make a noise when we turn a corner. The next corner we turn, we do so with full attention: Does it replicate? Is it the steering gear or the mudguard? Does it happen in the other direction? This is at the heart of science: Minimize distraction and confounds, control and replicate, with careful documentation of the variables that were manipulated. Individuals with compromised attentional abilities may learn some of these skills as they mature, buffering the severity of those deficits. White  has convincingly argued that remembering is best considered as discrimination at the time of retrieval; events that are more likely to be reinforced support better discriminations.
Another memorial tactic is to bias the search for events that might typically produce the reinforcer. Hume  noted, and psychological research has validated , that similarity strengthens association, as does spatial proximity. Indeed, the fan-out in Figure 1 is really a function of space-time. The larger the spatial context that must be considered, the more events must be processed at each step. This is vividly demonstrated in an experiment by Williams , who trained rats to lever-press with a 30 s delay between the response and reinforcement. The Marking group (triangles in Figure 3) received a 5 s chamber illumination immediately after the lever-press; the Blocking group (filled circles) received a 5 s chamber illumination immediately before the reinforcement; the control group received no chamber illumination. Figure 3 shows the results of his experiment.
Data from Williams showing the efficiency of learning under two manipulations of attention and a control condition. Reproduced with permission of the author and the Psychonomic Society.
The results are remarkable in two ways. In the case of blocking, a prominent stimulus essentially absorbed all of the credit for reinforcement, leaving none to strengthen the originating response. A simple-minded application of the principle of conditioned reinforcement – "Here's a nice reward for you Johnny" might effectively undermine the strengthening of the very response it was intended to enhance! Reminding the individual of the relevant response at the time of reinforcement can restore some of that potency. In the case of marking, the results endorse the wisdom of the adage "Catch them being good". It is likely that much of the efficacy of what we call conditioned reinforcement is due, not to conditioned reinforcement, but to marking. Marking relevant stimuli is especially important for individuals with attentional deficits. Some protocols for helping children learn involve gesture, voice modulation and visual marking to increase salience of relevant information, and precueing the desired behavior at point of performance, which then permits immediate feedback – reinforcement – that can be integrated with the target behavior. Rowe  encourages the use of stimulus dimensions, such as color, that increase the salience of the discriminative stimuli.
4. Backward chaining
A leading model of conditioning, the temporal-difference (TD) model , has proven successful in machine-learning instantiations, and has been seminal in the study of brain correlates of learning . This model essentially vests a proportion of the reinforcing strength of the primary reinforcer in each of the states that precede it, one step at a time on each occasion of conditioning. Such backward chaining prunes the causal net of Figure 1. It is a classic approach to establishing long sequences of behavior . Due to a steepened delay-of-reinforcement gradient, children with ADHD may have problems chaining responses into adaptive behavioral sequences where the elements in the sequence are linked together and function as a higher-order response unit (e.g. have difficulties finishing schoolwork, chores, or duties in the workplace without a long "to do list" or reminder notes). From an applied perspective when working with children, backward chaining and other behavioral techniques aimed at building or increasing sequences of behavior may be useful in ADHD and other developmental disorders in one-on-one settings. This strategy is often inconvenient to use for the rapid transmission of information in classroom settings. However, effective educational programs have been developed where sequences of behavior are built through the use of a strategy termed "scaffolding" where the teacher models, prompts, and reinforces behavior in a step-by-step fashion until the child performs the whole sequence independently, accurately, and fluently . Scaffolding is a component in the Tools of the Mind curriculum which has been shown to enhance learning, executive functioning, and development in preschool children [119,120].
Challenges for reinforcement theories of ADHD
Given the heterogeneity of ADHD findings, it is unlikely that any one theory can explain all cases of ADHD. Nevertheless, theories of ADHD should enable the integration of data from behavioral, genetic, neurobiological, cognitive, and clinical studies of ADHD. Reinforcement theories can explain many of the symptoms associated with ADHD and link these behavioral changes to changes at genetic and neurobiological levels through deficiencies in how the neuromodulator dopamine works. In this paper, we have also shown how cognitive processes like memory and attention are linked to the effects of reinforcers and may lie at the base of the suggested steepened delay-of-reinforcement gradient in ADHD. However, it is sometimes forgotten that also "top-down control of behavior" is acquired through learning. Cognitive processes like working memory, attention, and executive functions do not represent permanent traits of the individual, but are processes that can be significantly improved by training [84,85,121-125]. These findings attest to the importance of the environment in shaping and maintaining these functions. Hence, the primacy of these cognitive functions versus basic learning mechanisms and the directionality of cause and effects in ADHD need to be further studied.
A challenge for reinforcement theories of ADHD is to link the concepts of memory and attention used in our analyses of behavior to the corresponding concepts used in cognitive psychology. ADHD is associated with cognitive deficits including working memory impairment [32,33]. However, a precise translation from behavior to cognition requires a better operationalization of concepts such as short-term memory, long-term memory, working memory, encoding, storage, retrieval, attention than is currently available.
Previous studies of reinforcement processes in ADHD have used a variety of experimental designs and methods, producing a fragmented research literature. The reinforcement universe is broad and includes several important dimensions like reinforcer density, reinforcer delay, reinforcer predictability, and reinforcer value. The research questions become yet more challenging when the influences of memory and attention processes are taken into account, as they must be. Future studies need to systematically explore the various mechanisms that can affect the delay-of-reinforcement gradient, whether they are functionally equivalent and produce similar symptom outcomes, or whether they give rise to endophenotypes that can be differentiated and identified. Exploring possible common causative mechanisms, like deficient memory processing, may provide an opportunity for the integration of a reinforcement deficit as a causative factor with the complex network of other causal factors suggested for ADHD [126-133].
The section on optimizing learning by minimizing post-event interference and increasing attentional loading by marking of events suggests future studies of ADHD. The effects of time versus intervening events on memory decay and reinforcement effects in ADHD compared to normal controls can be tested using Killeen's procedure  modified for human subjects. Further, results from this procedure can be compared with data from studies using delayed-matching-to-sample data, i.e. time-driven memory decay (e.g. ), and studies of effects of interference in ADHD, i.e. event-driven memory decay (post-event interference, events occurring following the to-be-remembered event, should not be confused with interference tested by Stroop tests; the slowed response-time due to the suppression of an automated response, e.g. ). Additionally, reinforcer valence/magnitude can be varied to test whether this is independent of the obtained decay functions. The importance of attentional loading on memory and reinforcement effects can be tested by varying the salience of the stimulus used for response-marking, vary the temporal relation between the response and the marking stimulus, and possibly also test response marking combined with reinforcer delay to explore the memory decay of the marked response .
Reinforcement theories of ADHD need to explain not only the development of symptoms and the relation to other levels of description, but also the improvement of behavior following psychostimulant treatment. A challenge for such theories is that the symptom-reducing effects of central stimulants in ADHD seem too rapid in onset to be plausibly attributed to learning . Further, if drugs improve learning, then behavioral improvement should be long-lasting. However, the major beneficial effects of the drug wear off within hours, and correction of a learning deficit per se may seem an unlikely mechanism for these drugs' therapeutic actions. However, any medication that alters a reinforcer's effectiveness will shift the relative likelihoods of different classes of behavior, potentially producing rapid behavioral changes . In this sense, medication does not supply what the child has failed to learn in the past; it merely makes the child more able to attend and control his behavior under medication. This assumes that appropriate behavior is in the repertoire of children with ADHD, but is not produced due to the prevailing motivation or reinforcement contingencies. This is consistent with the observations that children with ADHD show adequate behavior under some reinforcement contingencies (continuous and immediate reinforcement) but not under other contingencies (partial and delayed reinforcement), and is consistent with the clinical notion that ADHD is not a problem of "knowing what to do but one of doing what you know" .
The three-factor rule describes an important principle underlying discriminated response learning at a synaptic level. Synaptic strengthening depends on the convergence of dopaminergic synapses (representing reinforcers) and corticostriatal synapses (representing the stimulus situation) upon individual neostriatal neurons (representing behavior) [56,57]. The three-factor rule can be translated into the delay-of-reinforcement gradient which is a concept operating at a behavioral level. Alterations in reinforcement processes in ADHD may be described by a steepened delay-of-reinforcement gradient which can explain the development of symptoms of inattention, hyperactivity, and impulsivity associated with ADHD [17,29]. The shape of the delay-of-reinforcement gradient is influenced by several processes, in particular attention and memory. Theoretical and experimental analyses of these factors are important to determine if and how reinforcement processes are altered in ADHD. Such analyses could also promote the collaboration between research groups, facilitate an integration of the ADHD research field, and ultimately lead to improved treatment strategies and intervention programs for ADHD.
The authors have no competing interests and are listed in approximate order of individual contribution to the manuscript.
All authors contributed to discussions, helped to draft the manuscript, and read and approved the final version of the manuscript.
The delay of reinforcement gradient as diffusion of attention
If 4 states must be considered for credit as a cause of a reinforcing or punishing event at each step of a sequence of prior events, for n steps there are 4n paths; for six states, 6n, and for a states, an. If the total credit available is c, and if it is to be evenly distributed, then the credit allocated to each path is ca-n. For consistency with traditional continuous models, replace a with the base e = 2.718... and n with λt, with t the time until reinforcement. The rate constant lambda (λ) is then the natural logarithm of the number of states evaluated per second (λ = ln[a]). If the total credit available is c, and if it is evenly distributed, then the credit allocated to each path must decay as:
s = cλe-λt
Note that λ re-emerges as a coefficient in Equation 1. That is because, under the assumption of constant capacity c, the area under Equation 1 is conserved, so that its integral must equal c independent of the rate of allocation of attention. Equation 1 satisfies that assumption; steepening of the gradient associated with increases in lambda will also increase its intercept (cλ). Other assumptions are possible. Suppose, for instance, that an initial credit c is depleted at a rate of λ, with no assumption of invariance of capacity over its rate of allocation. Then the appropriate model is:
In this case the function is "hinged" at an intercept of c. It is an empirical question which of these models is most relevant to research on ADHD . Because capacity c is often a free parameter, the difference between the two models is blunted by the models' ability to absorb λ into c: c' = (cλ). The test will be to see whether, by varying the number of states or their rate of presentation, the resulting changes in λ, are correlated with changes in c. If Model 1 is correct, but Equation 2 is used, then there should be a positive correlation between c and λ.
If the exhaustion of credit is modeled by Equations 1 or 2, those equations, ceteris paribus, also tell us how strongly a remote event is likely to be associated with reinforcement. But they do not tell the whole story, because they leave out the factors of marking, similarity and context. Modifications of this model are straightforward, but await relevant data.
This article is part of the international and interdisciplinary project "ADHD: From genes to therapy" (Project leader: Terje Sagvolden) at the Centre for Advanced Study at the Norwegian Academy of Science and Letters in Oslo, Norway (2004–2005), in which all the authors were participants.
- American Psychiatric Association . Diagnostic and statistical manual of mental disorders Text revision (DSM-IV-TR) 4. Washington DC: America Psychiatric Association; 2000.
- Dwivedi KN, Banhatti RG. Attention deficit/hyperactivity disorder and ethnicity. Arch Dis Child. 2005;90:i10–i12. doi: 10.1136/adc.2004.058180.[PMC free article][PubMed][Cross Ref]
- Meyer A, Eilertsen DE, Sundet JM, Tshifularo JG, Sagvolden T. Cross-cultural similarities in ADHD-like behaviour amongst South African primary school children. S Afr J Psychol. 2004;34:123–139.
- Rohde LA, Szobot C, Polanczyk G, Schmitz M, Martins S, Tramontina S. Attention-deficit/hyperactivity disorder in a diverse culture: do research and clinical findings support the notion of a cultural construct for the disorder? Biol Psychiatry. 2005;57:1436–1441. doi: 10.1016/j.biopsych.2005.01.042.[PubMed][Cross Ref]
- Polanczyk G, de Lima MS, Horta BL, Biederman J, Rohde LA. The worldwide prevalence of ADHD: a systematic review and metaregression analysis. Am J Psychiatry. 2007;164:942–948. doi: 10.1176/appi.ajp.164.6.942.[PubMed][Cross Ref]
- Kessler RC, Adler L, Barkley R, Biederman J, Conners CK, Demler O, et al. The prevalence and correlates of adult ADHD in the United States: results from the National Comorbidity Survey Replication. Am J Psychiatry. 2006;163:716–723. doi: 10.1176/appi.ajp.163.4.716.[PMC free article][PubMed][Cross Ref]
- Barkley RA, Fischer M, Smallish L, Fletcher K. Young adult follow-up of hyperactive children: antisocial activities and drug use. J Child Psychol Psychiatry. 2004;45:195–211. doi: 10.1111/j.1469-7610.2004.00214.x.[PubMed][Cross Ref]
While the previous chapter deals with the ways in which computers and algorithms could support existing practices of biological research, this chapter introduces a different type of opportunity. The quantities and scopes of data being collected are now far beyond the capability of any human, or team of humans, to analyze. And as the sizes of the datasets continue to increase exponentially, even existing techniques such as statistical analysis begin to suffer. In this data-rich environment, the discovery of large-scale patterns and correlations is potentially of enormous significance. Indeed, such discoveries can be regarded as hypotheses asserting that the pattern or correlation may be important—a mode of “discovery science” that complements the traditional mode of science in which a hypothesis is generated by human beings and then tested empirically.
For exploring this data-rich environment, simulations and computer-driven models of biological systems are proving to be essential.
5.1. ON MODELS IN BIOLOGY
In all sciences, models are used to represent, usually in an abbreviated form, a more complex and detailed reality. Models are used because in some way, they are more accessible, convenient, or familiar to practitioners than the subject of study. Models can serve as explanatory or pedagogical tools, represent more explicitly the state of knowledge, predict results, or act as the objects of further experiments. Most importantly, a model is a representation of some reality that embodies some essential and interesting aspects of that reality, but not all of it.
Because all models are by definition incomplete, the central intellectual issue is whether the essential aspects of the system or phenomenon are well represented (the term “essential” has multiple meanings depending on what aspects of the phenomenon are of interest). In biological phenomena, what is interesting and significant is usually a set of relationships—from the interaction of two molecules to the behavior of a population in its environment. Human comprehension of biological systems is limited, among other things, by that very complexity and by the problems that arise when attempting to dissect a given system into simpler, more easily understood components. This challenge is compounded by our current inability to understand relationships between the components as they occur in reality, that is, in the presence of multiple, competing influences and in the broader context of time and space.
Different fields of science have traditionally used models for different purposes; thus, the nature of the models, the criteria for selecting good or appropriate models, and the nature of the abbreviation or simplification have varied dramatically. For example, biologists are quite familiar with the notion of model organisms.1 A model organism is a species selected for genetic experimental analysis on the basis of experimental convenience, homology to other species (especially to humans), relative simplicity, or other attractive attributes. The fruit fly Drosophila melanogaster is a model organism attractive at least in part because of its short generational time span, allowing many generations in the course of an experiment.
At the most basic level, any abstraction of some biological phenomenon counts as a model. Indeed, the cartoons and block diagrams used by most biologists to represent metabolic, signaling, or regulatory pathways are models—qualitative models that lay out the connectivity of elements important to the phenomenon. Such models throw away details (e.g., about kinetics) implicitly asserting that omission of such details does not render the model irrelevant.
A second example of implicit modeling is the use of statistical tests by many biologists. All statistical tests are based on a null hypothesis, and all null hypotheses are based on some kind of underlying model from which the probability distribution of the null hypothesis is derived. Even those biologists who have never thought of themselves as modelers are using models whenever they use statistical tests.
Mathematical modeling has been an important component of several biological disciplines for many decades. One of the earliest quantitative biological models involved ecology: the Lotka-Volterra model of species competition and predator-prey relationships described in Section 5.2.4. In the context of cell biology, models and simulations are used to examine the structure and dynamics of a cell or organism's function, rather than the characteristics of isolated parts of a cell or organism.2 Such models must consider stochastic and deterministic processes, complex pleiotropy, robustness through redundancy, modular design, alternative pathways, and emergent behavior in biological hierarchy.
In a cellular context, one goal of biology is to gain insight into the interactions, molecular or otherwise, that are responsible for the behavior of the cell. To do so, a quantitative model of the cell must be developed to integrate global organism-wide measurements taken at many different levels of detail.
The development of such a model is iterative. It begins with a rough model of the cell, based on some knowledge of the components of the cell and possible interactions among them, as well as prior biochemical and genetic knowledge. Although the assumptions underlying the model are insufficient and may even be inappropriate for the system being investigated, this rough model then provides a zeroth-order hypothesis about the structure of the interactions that govern the cell's behavior.
Implicit in the model are predictions about the cell's response under different kinds of perturbation. Perturbations may be genetic (e.g., gene deletions, gene overexpressions, undirected mutations) or environmental (e.g., changes in temperature, stimulation by hormones or drugs). Perturbations are introduced into the cell, and the cell's response is measured with tools that capture changes at the relevant levels of biological information (e.g., mRNA expression, protein expression, protein activation state, overall pathway function). Box 5.1 provides some additional detail on cellular perturbations.
Perturbation of Biological Systems. Perturbation of biological systems can be accomplished through a number of genetic mechanisms, such as the following: High-throughput genomic manipulation. Increasingly inexpensive and highly standardized tools are (more...)
The next step is comparison of the model's predictions to the measurements taken. This comparison indicates where and how the model must be refined in order to match the measurements more closely. If the initial model is highly incomplete, measurements can be used to suggest the particular components required for cellular function and those that are most likely to interact. If the initial model is relatively well defined, its predictions may already be in good qualitative agreement with measurement, differing only in minor quantitative ways. When model and measurement disagree, it is often necessary to create a number of more refined models, each incorporating a different mechanism underlying the discrepancies in measurement.
With the refined model(s) in hand, a new set of perturbations can be applied to the cell. Note that new perturbations are informative only if they elicit different responses between models, and they are most useful when the predictions of the different models are very different from one another. Nevertheless, a new set of perturbations is required because the predictions of the refined model(s) will generally fit well with the old set of measurements.
The refined model that best accounts for the new set of measurements can then be regarded as the initial model for the next iteration. Through this process, model and measurement are intended to converge in such a way that the model's predictions mirror biological responses to perturbation. Modeling must be connected to experimental efforts so that experimentalists will know what needs to be determined in order to construct a comprehensive description and, ultimately, a theoretical framework for the behavior of a biological system. Feedback is very important, and it is this feedback, along with the global—or, loosely speaking, genomic-scale—nature of the inquiry that characterizes much of 21st century biology.
5.2. WHY BIOLOGICAL MODELS CAN BE USEFUL
In the last decade, mathematical modeling has gained stature and wider recognition as a useful tool in the life sciences. Most of this revolution has occurred since the era of the genome, in which biologists were confronted with massive challenges to which mathematical expertise could successfully be brought to bear. Some of the success, though, rests on the fact that computational power has allowed scientists to explore ever more complex models in finer detail. This means that the mathematician's talent for abstraction and simplification can be complemented with realistic simulations in which details not amenable to analysis can be explored. The visual real-time simulations of modeled phenomena give more compelling and more accessible interpretations of what the models predict.3 This has made it easier to earn the recognition of biologists.
On the other hand, modeling—especially computational modeling—should not be regarded as an intellectual panacea, and models may prove more hindrance than help under certain circumstances. In models with many parameters, the state space to be explored may grow combinatorially fast so that no amount of data and brute force computation can yield much of value (although it may be the case that some algorithm or problem-related insight can reduce the volume of state space that must be explored to a reasonable size). In addition, the behavior of interest in many biological systems is not characterized as equilibrium or quasi-steady-state behavior, and thus convergence of a putative solution may never be reached. Finally, modeling presumes that the researcher can both identify the important state variables and obtain the quantitative data relevant to those variables.4
Computational models apply to specific biological phenomena (e.g., organisms, processes) and are used for a number of purposes as described below.
5.2.1. Models Provide a Coherent Framework for Interpreting Data
A biologist surveys the number of birds nesting on offshore islands and notices that the number depends on the size (e.g., diameter) of the island: the larger the diameter d, the greater is the number of nests N. A graph of this relationship for islands of various sizes reveals a trend. Here the mathematically informed and uninformed part ways: simple linear least-squares fit of the data misses a central point. A trivial “null model” based on an equal subdivision of area between nesting individuals predicts that N~ d2, (i.e., the number of nests should be roughly proportional to the square of island area). This simple geometric property relating area to population size gives a strong indication of the trend researchers should expect to see. Departures from this trend would indicate that something else may be important. (For example, different parts of islands are uninhabitable, predators prefer some islands to others, and so forth.)
Although the above example is elementary, it illustrates the idea that data are best interpreted within a context that shapes one's expectations regarding what the data “ought” to look like; often a mathematical (or geometric) model helps to create that context.
5.2.2. Models Highlight Basic Concepts of Wide Applicability
Among the earliest applications of mathematical ideas to biology are those in which population levels were tracked over time and attempts were made to understand the observed trends. Malthus proposed in 1798 the fitting of population data to exponential growth curves following his simple model for geometric growth of a population.5 The idea that simple reproductive processes produce exponential growth (if birth rates exceed mortality rates) or extinction (in the opposite case) is a fundamental principle: its applicability in biology, physics, chemistry, as well as simple finance, is central.
An important refinement of the Malthus model was proposed in 1838 to explain why most populations do not experience exponential growth indefinitely. The refinement was the idea of the density-dependent growth law, now known as the logistic growth model.6 Though simple, the Verhulst model is still used widely to represent population growth in many biological examples. Both Malthus and Verhulst models relate observed trends to simple underlying mechanisms; neither model is fully accurate for real populations, but deviations from model predictions are, in themselves, informative, because they lead to questions about what features of the real systems are worthy of investigation.
More recent examples of this sort abound. Nonlinear dynamics has elucidated the tendency of excitable systems (cardiac tissue, nerve cells, and networks of neurons) to exhibit oscillatory, burst, and wave-like phenomena. The understanding of the spread of disease in populations and its sensitive dependence on population density arose from simple mathematical models. The same is true of the discovery of chaos in the discrete logistic equation (in the 1970s). This simple model and its mathematical properties led to exploration of new types of dynamic behavior ubiquitous in natural phenomena. Such biologically motivated models often cross-fertilize other disciplines: in this case, the phenomenon of chaos was then found in numerous real physical, chemical, and mechanical systems.
5.2.3. Models Uncover New Phenomena or Concepts to Explore
Simple conceptual models can be used to uncover new mechanisms that experimental science has not yet encountered. The discovery of chaos mentioned above is one of the clearest examples of this kind. A second example of this sort is Turing's discovery that two chemicals that interact chemically in a particular way (activate and inhibit one another) and diffuse at unequal rates could give rise to “peaks and valleys” of concentration. His analysis of reaction-diffusion (RD) systems showed precisely what ranges of reaction rates and rates of diffusion would result in these effects, and how properties of the pattern (e.g., distance between peaks and valleys) would depend on those microscopic rates. Later research in the mathematical community also uncovered how other interesting phenomena (traveling waves, oscillations) were generated in such systems and how further details of patterns (spots, stripes, etc.) could be affected by geometry, boundary conditions, types of chemical reactions, and so on.
Turing's theory was later given physical manifestation in artificial chemical systems, manipulated to satisfy the theoretical criteria of pattern formation regimes. And, although biological systems did not produce simple examples of RD pattern formation, the theoretical framework originating in this work motivated later more realistic and biologically based modeling research.
5.2.4. Models Identify Key Factors or Components of a System
Simple conceptual models can be used to gain insight, develop intuition, and understand “how something works.” For example, the Lotka-Volterra model of species competition and predator-prey7 is largely conceptual and is recognized as not being very realistic. Nevertheless, this and similar models have played a strong role in organizing several themes within the discipline: for example, competitive exclusion, the tendency for a species with a slight advantage to outcompete, dominate, and take over from less advantageous species; the cycling behavior in predator-prey interactions; and the effect of resource limitations on stabilizing a population that would otherwise grow explosively. All of these concepts arose from mathematical models that highlighted and explained dynamic behavior within the context of simple models. Indeed, such models are useful for helping scientists to recognize patterns and predict system behavior, at least in gross terms and sometimes in detail.
5.2.5. Models Can Link Levels of Detail (Individual to Population)
Biological observations are made at many distinct hierarchies and levels of detail. However, the links between such levels are notoriously difficult to understand. For example, the behavior of single neurons and their response to inputs and signaling from synaptic connections might be well known. The behavior of a large assembly of such neurons in some part of the central nervous system can be observed macroscopically by imaging or electrode recording techniques. However, how the two levels are interconnected remains a massive challenge to scientific understanding. Similar examples occur in countless settings in the life sciences: due to the complexity of nonlinear interactions, it is nearly impossible to grasp intuitively how collections of individuals behave, what emergent properties of these groups arise, or the significance of any sensitivity to initial conditions that might be magnified at higher levels of abstraction. Some mathematical techniques (averaging methods, homogenization, stochastic methods) allow the derivation of macroscopic statements based on assumptions at the microscopic, or individual, level. Both modeling and simulation are important tools for bridging this gap.
5.2.6. Models Enable the Formalization of Intuitive Understandings
Models are useful for formalizing intuitive understandings, even if those understandings are partial and incomplete. What appears to be a solid verbal argument about cause and effect can be clarified and put to a rigorous test as soon as an attempt is made to formulate the verbal arguments into a mathematical model. This process forces a clarity of expression and consistency (of units, dimensions, force balance, or other guiding principles) that is not available in natural language. As importantly, it can generate predictions against which intuition can be tested.
Because they run on a computer, simulation models force the researcher to represent explicitly important components and connections in a system. Thus, simulations can only complement, but never replace, the underlying formulation of a model in terms of biological, physical, and mathematical principles. That said, a simulation model often can be used to indicate gaps in one's knowledge of some phenomenon, at which point substantial intellectual work involving these principles is needed to fill the gaps in the simulation.
5.2.7. Models Can Be Used as a Tool for Helping to Screen Unpromising Hypotheses
In a given setting, quantitative or descriptive hypotheses can be tested by exploring the predictions of models that specify precisely what is to be expected given one or another hypothesis. In some cases, although it may be impossible to observe a sequence of biological events (e.g., how a receptor-ligand complex undergoes sequential modification before internalization by the cell), downstream effects may be observable. A model can explore the consequences of each of a variety of possible sequences can and help scientists to identify the most likely candidate for the correct sequence. Further experimental observations can then refine one's understanding.
5.2.8. Models Inform Experimental Design
Modeling properly applied can accelerate experimental efforts at understanding. Theory embedded in the model is an enabler for focused experimentation. Specifically, models can be used alongside experiments to help optimize experimental design, thereby saving time and resources. Simple models give a framework for observations (as noted in Section 5.2.1) and thereby suggest what needs to be measured experimentally and, indeed, what need not be measured—that is how to refine the set of observations so as to extract optimal knowledge about the system. This is particularly true when models and experiments go hand-in-hand. As a rule, several rounds of modeling and experimentation are necessary to lead to informative results.
Carrying these general observations further, Selinger et al.8 have developed a framework for understanding the relationship between the properties of certain kinds of models and the experimental sampling required for “completeness” of the model. They define a model as a set of rules that maps a set of inputs (e.g., possible descriptions of a cell's environment) to a set of outputs (e.g., the resulting concentrations of all of the cell's RNAs and proteins). From these basic properties, Selinger et al. are able to determine the order of magnitude of the number of measurements needed to populate the space of all possible inputs (e.g., environmental conditions) with enough measured outputs (e.g., transcriptomes, proteomes) to make prediction feasible, thereby establishing how many measurements are needed to adequately sample input space to allow the rule parameters to be determined.
Using this framework, Salinger et al. estimate the experimental requirements for the completeness of a discrete transcriptional network model that maps all N genes as inputs to all N genes as outputs in which the genes can take on three levels of expression (low, medium, and high) and each gene has, at most, K direct regulators. Applying this model to three organisms—Mycoplasma pneumoniae, Escherichia coli, and Homo sapiens—they find that 80, 40,000, and 700,000 transcriptome experiments, respectively, are necessary to fill out this model. They further note that the upper-bound estimate of experimental requirements grows exponentially with the maximum number of regulatory connections K per gene, although genes tend to have a low K, and that the upper-bound estimate grows only logarithmically with the number of genes N, making completeness feasible even for large genetic networks.
5.2.9. Models Can Predict Variables Inaccessible to Measurement
Technological innovation in scientific instrumentation has revolutionized experimental biology. However, many mysteries of the cell, of physiology, of individual or collective animal behavior, and of population-level or ecosystem-level dynamics remain unobservable. Models can help link observations to quantities that are not experimentally accessible. At the scale of a few millimeters, Marée and Hogeweg recently developed9 a computational model based on a cellular automaton for the behavior of the social amoeba Dictyostelium discoideum. Their model is based on differential adhesion between cells, cyclic adenosine monophosphate (cAMP) signaling, cell differentiation, and cell motion. Using detailed two- and three-dimensional simulations of an aggregate of thousands of cells, the authors showed how a relatively small set of assumptions and “rules” leads to a fully accurate developmental pathway. Using the simulation as a tool, they were able to explore which assumptions were blatantly inappropriate (leading to incorrect outcomes). In its final synthesis, the Marée-Hogeweg model predicts dynamic distributions of chemicals and of mechanical pressure in a fully dynamic simulation of the culminating Dictyostelium slug. Some, but not all, of these variables can be measured experimentally: those that are measurable are well reproduced by the model. Those that cannot (yet) be measured are predicted inside the evolving shape. What is even more impressive: the model demonstrates that the system has self-correcting properties and accounts for many experimental observations that previously could not be explained.
5.2.10. Models Can Link What Is Known to What Is Yet Unknown
In the words of Pollard, “Any cellular process involving more than a few types of molecules is too complicated to understand without a mathematical model to expose assumptions and to frame the reactions in a rigorous setting.”10 Reviewing the state of the field in cell motility and the cytoskeleton, he observes that even with many details of the mechanism as yet controversial or unknown, modeling plays an important role. Referring to a system (of actin and its interacting proteins) modeled by Mogilner and Edelstein-Keshet,11 he points to advantages gained by the mathematical framework: “A mathematical model incorporating molecular reactions and physical forces correctly predicts the steady-state rate of cellular locomotion.” The model, he notes, correctly identifies what limits the motion of the cell, predicts what manipulations would change the rate of motion, and thus suggests experiments to perform. While details of some steps are still emerging, the model also distinguishes quantitatively between distinct hypotheses for how actin filaments are broken down for purposes of recycling their components.
5.2.11. Models Can Be Used to Generate Accurate Quantitative Predictions
Where detailed quantitative information exists about components of a system, about underlying rules or interactions, and about how these components are assembled into the system as a whole, modeling may be valuable as an accurate and rigorous tool for generating quantitative predictions. Weather prediction is one example of a complex model used on a daily basis to predict the future. On the other hand, the notorious difficulties of making accurate weather predictions point to the need for caution in adopting the conclusions even of classical models, especially for more than short-term predictions, as one might expect from mathematically chaotic systems.
5.2.12. Models Expand the Range of Questions That Can Meaningfully Be Asked12
For much of life science research, questions of purpose arise about biological phenomena. For instance, the question, Why does the eye have a lens? most often calls for the purpose of the lens—to focus light rays—and only rarely for a description of the biological mechanism that creates the lens. That such an answer is meaningful is the result of evolutionary processes that shape biological entities by enhancing their ability to carry out fitness-enhancing functions. (Put differently, biological entities are the result of nature's engineering of devices to perform the function of survival; this perspective is explored further in Chapter 6.)
Lander points out that molecular biologists traditionally have shied away from teleological matters, and that geneticists generally define function not in terms of the useful things a gene does, but by what happens when the gene is altered. However, as the complexity of biological mechanism is increasingly revealed, the identification of a purpose or a function of that mechanism has enormous explanatory power. That is, what purpose does all this complexity serve?
As the examples in Section 5.4 illustrate, computational modeling is an approach to exploring the implications of the complex interactions that are known from empirical and experimental work. Lander notes that one general approach to modeling is to create models in which networks are specified in terms of elements and interactions (the network “topology”), but the numerical values that quantify those interactions (the parameters) are deliberately varied over wide ranges to explore the functionality of the network—whether it acts as a “switch,” “filter,” “oscillator,” “dynamic range adjuster,” “producer of stripes,” and so on.
Lander explains the intellectual paradigm for determining function as follows:
By investigating how such behaviors change for different parameter sets—an exercise referred to as “exploring the parameter space”—one starts to assemble a comprehensive picture of all the kinds of behaviors a network can produce. If one such behavior seems useful (to the organism), it becomes a candidate for explaining why the network itself was selected; i.e., it is seen as a potential purpose for the network. If experiments subsequently support assignments of actual parameter values to the range of parameter space that produces such behavior, then the potential purpose becomes a likely one.
5.3. TYPES OF MODELS13
5.3.1. From Qualitative Model to Computational Simulation
Biology makes use of many different types of models. In some cases, biological models are qualitative or semiquantitative. For example, graphical models show directional connections between components, with the directionality indicating influence. Such models generally summarize a great deal of known information about a pathway and facilitate the formation of hypotheses about network function. Moreover, the use of graphical models allows researchers to circumvent data deficiencies that might be encountered in the development of more quantitative (and thus data-intensive) models. (It has also been argued that probabilistic graphical models provide a coherent, statistically sound framework that can be applied to many problems, and that certain models used by biologists, such as hidden Markov models or Bayesian Networks), can be regarded as special cases of graphical models.14)
On the other hand, the forms and structures of graphical models are generally inadequate to express much detail, which might well be necessary for mechanistic models. In general, qualitative models do not account for mechanisms, but they can sometimes be developed or analyzed in an automated manner. Some attempts have been made to develop formal schemes for annotating graphical models (Box 5.2).15
On Graphical Models. A large fraction of today's knowledge of biochemical or genetic regulatory networks is represented either as text or as cartoon-like diagrams. However, text has the disadvantage of being inherently ambiguous, and every reader must (more...)
Qualitative models can be logical or statistical as well. For example, statistical properties of a graph of protein-protein interaction have been used to infer the stability of a network's function against most “deletions” in the graph.16 Logical models can be used when data regarding mechanism are unavailable and have been developed as Boolean, fuzzy logical, or rule-based systems that model complex networks17 or genetic and developmental systems.
In some cases, greater availability of data (specifically, perturbation response or time-series data) enables the use of statistical influence models. Linear,18 neural network-like,19 and Bayesian20 models have all been used to deduce both the topology of gene expression networks and their dynamics. On the other hand, statistical influence models are not causal and may not lead to a better understanding of underlying mechanisms.
Quantitative models make detailed statements about biological processes and hence are easier to falsify than more qualitative models. These models are intended to be predictive and are useful for understanding points of control in cellular networks and for designing new functions within them.
Some models are based on power law formalisms.21 In such cases, the data are shown to fit generic power laws, and the general theory of power law scaling (for example) is used to infer some degree of causal structure. They do not provide detailed insight into mechanism, although power law models form the basis for a large class of metabolic control analyses and dynamic simulations.
Computational models—simulations—represent the other end of the modeling spectrum. Simulation is often necessary to explore the implications of a model, especially its dynamical behavior, because human intuition about complex nonlinear systems is often inadequate.22 Lander cites two examples. The first is that “intuitive thinking about MAP [mitogen-activated protein] kinase pathways led to the long-held view that the obligatory cascade of three sequential kinases serves to provide signal amplification. In contrast, computational studies have suggested that the purpose of such a network is to achieve extreme positive cooperativity, so that the pathway behaves in a switch-like, rather than a graded, fashion.”23 The second example is that while intuitive interpretations of experiments in the study of morphogen gradient formation in animal development led to the conclusion that simple diffusion is not adequate to transport most morphogens, computational analysis of the same experimental data led the opposite conclusion.24
Simulation, which traces functional biological processes through some period of time, generates results that can be checked for consistency with existing data (“retrodiction” of data) and can also predict new phenomena not explicitly represented in but nevertheless consistent with existing datasets. Note also that when a simulation seeks to capture essential elements in some oversimplified and idealized fashion, it is unrealistic to expect the simulation to make detailed predictions about specific biological phenomena. Such simulations may instead serve to make qualitative predictions about tendencies and trends that become apparent only when averaged over a large number of simulation runs. Alternatively, they may demonstrate that certain biological behaviors or responses are robust and do not depend on particular details of the parameters involved within a very wide range.
Simulations can also be regarded as a nontraditional form of scientific communication. Traditionally, scientific communications have been carried by journal articles or conference presentations. Though articles and presentations will continue to be important, simulations—in the form of computer programs—can be easily shared among members of the research community, and the explicit knowledge embedded in them can become powerful points of departure for the work of other researchers.
With the availability of cheap and powerful computers, modeling and simulation have become nearly synonymous. Yet, a number of subtle differences should be mentioned. Simulation can be used as a tool on its own or as a companion to mathematical analysis.
In the case of relatively simple models meant to provide insight or reveal a concept, analytical and mathematical methods are of primary utility. With simple strokes and pen-and-paper computations, the dependence of behavior on underlying parameters (such as rate constants), conditions for specific dynamical behavior, and approximate connections between macroscopic quantities (e.g., the velocity of a cell) and underlying microscopic quantities (such the number of actin filaments causing the membrane to protrude) can be revealed. Simulations are not as easily harnessed to making such connections.
Simulations can be used hand-in-hand with analysis for simple models: exploring slight changes in equations, assumptions, or rates and gaining familiarity can guide the best directions to explore with simple models as well. For example, G. Bard Ermentrout at the University of Pittsburgh developed XPP software as an evolving and publicly available experimental modeling tool for mathematical biologists.25 XPP has been the foundation of computational investigations in many challenging problems in neurophysiology, coupled oscillators, and other realms.
Mathematical analysis of models, at any level of complexity, is often restricted to special cases that have simple properties: rectangular boundaries, specific symmetries, or behavior in a special class. Simulations can expand the repertoire and allow the modeler to understand how analysis of the special cases relates to more realistic situations. In this case, simulation takes over where analysis ends.26 Some systems are simply too large or elaborate to be understood using analytical techniques. In this case, simulation is a primary tool. Forecasts requiring heavy “number-crunching” (e.g., weather prediction, prediction of climate change), as well as those involving huge systems of diverse interacting components (e.g., cellular networks of signal transduction cascades), are only amenable to exploration using simulation methods.
More detailed models require a detailed consideration of chemical or physical mechanisms involved (i.e., these models are mechanistic27). Such models require extensive details of known biology and have the largest data requirements. They are, in principle, the most predictive. In the extreme, one can imagine a simulation of a complete cell—an “in silico” cell or cybercell—that provides an experimental framework in which to investigate many possible interventions. Getting the right format, and ensuring that the in silico cell is a reasonable representation of reality, has been and continues to be an enormous challenge.
No reasonable model is based entirely on a bottom-up analysis. Consider, for example, that solving Schrödinger's equation for the millions of atoms in a complex molecule in solution would be a futile exercise, even if future supercomputers could handle this task. The question to ask is how and why such work would be contemplated: finding the correct level of representation is one of the key steps to good scientific work. Thus, some level of abstraction is necessary to render any model both interesting scientifically and feasible computationally. Done properly, abstractions can clarify the sources of control in a network and indicate where more data are necessary. At the same time, it may be necessary to construct models at higher degrees of biophysical realism and detail in any event, either because abstracted models often do not capture the essential behavior of interest or to show that indeed the addition of detail does not affect the conclusions drawn from the abstracted model.28
It is also helpful to note the difference between a computational artifact that reproduces some biological behavior (a task) and a simulation. In the former case, the relevant question is: “How well does the artifact accomplish the task?” In the latter case, the relevant question is: “How closely does the simulation match the essential features of the system in question?”
Most computer scientists would tend to assign higher priority to performance than to simulation. The computer scientist would be most interested in a biologically inspired approach to a computer science problem when some biological behavior is useful in a computational or computer systems context and when the biologically inspired artifact can demonstrate better performance than is possible through some other way of developing or inspiring the artifact. A model of a biological system then becomes useful to the computer scientist only to the extent that high-fidelity mimicking of how nature accomplishes a task will result in better performance of that task.
By contrast, biologists would put greater emphasis on simulation. Empirically tested and validated simulations with predictive capabilities would increase their confidence that they understood in some fundamental sense the biological phenomenon in question. However, it is important to note that because a simulation is judged on the basis of how closely it represents the essential features of a biological system, the question “What counts as essential?” is central (Box 5.3). More generally, one fundamental focus of biological research is a determination of what the “essential” features of a biological system are, recognizing that what is “essential” cannot be determined once and for all, but rather depends on the class of questions under consideration.
An Illustration of “Essential”. Consider the following modeling task. The phenomenon of interest is a monkey learning to fetch a banana from behind a transparent conductive screen. The first time, the monkey sees the banana, goes straight (more...)
5.3.2. Hybrid Models
Hybrid models are models composed of objects with different mathematical representations. These allow a model builder the flexibility to mix modeling paradigms to describe different portions of a complex system. For example, in a hybrid model, a signal transduction pathway might be described by a set of differential equations, and this pathway could be linked to a graphical model of the genetic regulatory network that it influences. An advantage of hybrid models is that model components can evolve from high-level abstract descriptions to low-level detailed descriptions as the components are better characterized and understood.
An example of hybrid model use is offered by McAdams and Shapiro,29 who point out that genetic networks involving large numbers of genes (more than tens) are difficult to analyze. Noting the “many parallels in the function of these biochemically based genetic circuits and electrical circuits,” they propose “a hybrid modeling approach that integrates conventional biochemical kinetic modeling within the framework of a circuit simulation. The circuit diagram of the bacteriophage lambda lysislysogeny decision circuit represents connectivity in signal paths of the biochemical components. A key feature of the lambda genetic circuit is that operons function as active integrated logic components and introduce signal time delays essential for the in vivo behavior of phage lambda.”
There are good numerical methods for simulating systems that are formulated in terms of ordinary differential equations or algebraic equations, although good methods for analysis of such models are still lacking. Other systems, such as those that mix continuous with discrete time or Markov processes with partial differential equations, are sometimes hard to solve even by numerical methods. Further, a particular model object may change mathematical representation during the course of the analysis. For example, at the beginning of a biosynthetic process there may be very small amounts of product so its concentration would have to be modeled discretely. As more of it is synthesized, the concentration becomes high enough that a continuous approximation is justified and is then more efficient for simulation and analysis.
The point at which this switch is made is dependent not just on copy number but also on where in the dynamical state space the system resides. If the system is near a bifurcation point, small fluctuations may be significant. Theories of how to accomplish this dynamic switching are lacking. As models grow more complex, different parts of the system will have to be modeled with different mathematical representations. Also, as models from different sources begin to be joined, it is clear that different representations will be used. It is critical that the theory and applied mathematics of hybrid dynamical systems be developed.
5.3.3. Multiscale Models
Multiscale models describe processes occurring at many time and length scales. Depending on the biological system of interest, the data needed to provide the basis for a greater understanding of the system will cut across several scales of space and time. The length dimensions of biological interest range from small organic molecules to multiprotein complexes at 100 angstroms to cellular processes at 1,000 angstroms to tissues at 1-10 microns, and the interaction of human populations with the environment at the kilometer scale. The temporal domain includes the femtosecond chemistry of molecular interactions to the millions of years of evolutionary time, with protein folding in seconds and cell and developmental processes in minutes, hours, and days. In turn, the scale of the process involved (e.g., from the molecular scale to the ecosystem scale) affects both the complexity of the representation (e.g., molecule base, concentration based, at equilibrium or fully dynamic) and the modality of the representation (e.g., biochemical, genetic, genomic, electrophysiological, etc.).
Consider the heart as an example. The macroscopic unit of interest is the heartbeat, which lasts about a second and involves the whole heart of 10 cm scale. But the cardiac action potential (the electrical signal that initiates myocellular contractions) can change significantly on time scales of milliseconds as reflected in the appropriate kinetic equations. In turn, the molecular interactions that underlie kinetic flows occur on time scales on the order of femtoseconds. Across such variation in time scales, it is not feasible to model 1015 molecular interactions in order to model a complete heartbeat. Fortunately, in many situations the response with the shorter time scale will converge quickly to equilibrium or quasi-steady-state behavior, obviating the need for a complete lower-level simulation.30
For most biological problems, the scale at which data could provide a central insight into the operation of the whole system is not known, so multiple scales are of interest. Thus, biological models have to allow for transition among different levels of resolution. A biologist might describe a protein as a simple ellipsoid and then in the next breath explain the effect of a point mutation by the atomic-level structural changes it causes in the active site.31
Identifying the appropriate ranges of parameters (e.g., rate constants that govern the pace of chemical reactions) remains one of the difficulties that every modeler faces sooner or later. As modelers know well, even qualitative analysis of simple models depends on knowing which “leading-order terms” are to be kept on which time scales. When the relative rates are entirely unknown—true of many biochemical steps in living cells—it is hard to know where to start and how to assemble a relevant model, a point that underscores the importance of close dialogue between the laboratory biologist and the mathematical or computational modeler.
Finally, data obtained at a particular scale must be sufficient to summarize the essential biological activity at that scale in order to be evaluated in the context of interactions at greater scales of complexity. The challenge, therefore, is one of understanding not only the relationship of multiple variables operating at one scale of detail, but also the relationship of multivariable datasets collected at different scales.
5.3.4. Model Comparison and Evaluation
Models are ultimately judged by their ability to make predictions. Qualitative models predict trends or types of dynamics that can occur, as well as thresholds and bifurcations that delineate one type of behavior from another. Quantitative models predict values that can be compared to actual experimental data. Therefore, the selection of experiments to be performed can be determined, at least in part, by their usefulness in constraining a model or selecting one model from a set of competing models.
The first step in model evaluation is to replicate and test a computational model of biological systems that has been published. However, most papers contain typographical errors and do not provide a complete specification of the biological properties that were represented in the model. One should be able to extract the specification from the model's source code, but for a whole host of reasons it is not always possible to obtain the actual files that were used for the published work.
In the neuroscience field, ModelDB (http://senselab.med.yale.edu/senselab/modeldb/) is being developed to answer the need for a database of published models used in neuroscience research.32 It is part of the SenseLab project (http://senselab.med.yale.edu/), which is supported through the Human Brain Project by the National Institute of Mental Health (NIMH), the National Institute of Neurologist disorders and Stroke (NINDS), and the National Cancer Institute (NCI).
ModelDB is a curated database that is designed for convenient entry, search, and retrieval of models written for any programming language or simulation environment. As of December 10, 2004, it contained 141 downloadable models. Most of these are for NEURON, but 40 of them are for MATLAB, GENESIS, SNNAP, or XPP, and there are also some models in C/C++ and FORTRAN. Database entries are linked to the published literature so that users can more easily determine the “scientific context” of any given model.
Although ModelDB is still in a developmental or research stage, it has already begun to have a positive effect on computational modeling in neuroscience. Database logs indicate that it is seeing heavy usage, and from personal communications the committee has learned that even experienced programmers who write their own code in C/C++ are regularly examining models written for NEURON and other domain-specific simulators, in order to determine key parameter values and other important details. Recently published papers are beginning appear that cite ModelDB and the models it contains as sources of code, equations, or parameters. Furthermore, a leading journal has adopted a policy that requires authors to make their source code available as a condition of publication and encourages them to use ModelDB for this purpose.
As for model comparison, it is not possible to ascertain in isolation whether a given model is correct since contradictory data may become available later, and indeed even “incorrect” models may make correct predictions. Suitably complex models can be made to fit to any dataset, and one must guard against “overfitting” a model. Thus, the predictions of a model must be viewed in the context of the number of degrees of freedom of the model, and one measure that one model is better than another is a judgment about which model best explains experimental data with the least model complexity. In some cases, measures of the statistical significance of a model can be computed using a likelihood distribution over predicated state variables taking into account the number of degrees of freedom present in the model.
At the same time, lessons learned over many centuries of scientific investigation regarding the use of Occam's Razor may have limited applicability in this context. Because biological phenomena are the result of an evolutionary process that simply uses what is available, many biological phenomena are simply cobbled together and in no sense can be regarded as the “simplest” way to accomplish something.
As noted in Footnote 28, there is a tension between the need to capture details faithfully in a model and the desire to simplify those details so as to arrive at a representation that can be analyzed, understood fully, and converted into scientific “knowledge.” There are numerous ways of reducing models that are well known in applied mathematics communities. These include dimensional analysis and multiple time-scale analysis (i.e., dissecting a system into parts that evolve rapidly versus those that change on a slower time scale). In some cases, leaving out some of the interacting components (e.g., those whose interactions are weakest or least significant) may be a workable method. In other cases, lumping together families or groups of substances to form aggregate components or compartments works best. Sensitivity analysis of alternative model structures and parameters can be performed using likelihood and significance measures. Sensitivity analysis is important to inform a model builder of the essential components of the model and to attempt to reduce model complexity without loss of explanatory power.
Model evaluation can be complicated by the robustness of the biological organism being represented. Robustness generally means that the organism will endure and even prosper under a wide range of conditions—which means that its behavior and responses are relatively insensitive to variations in detail.33 That is, such differences are unlikely to matter much for survival. (For example, the modeling of genetic regulatory networks can be complicated by the fact that although the data may show that a certain gene is expressed under certain circumstances, the biological function being served may not depend on the expression of that gene.) On the other hand, this robustness may also mean that a flawed understanding of detailed processes incorporated into a model that does explain survival responses and behavior will not be reflected in the model's output.34
Simulation models are essentially computer programs and hence suffer from all of the problems that plague software development. Normal practice in software development calls for extensive testing to see that a program returns the correct results when given test data for which the appropriate results are known independently of the program as well as for independent code reviews. In principle, simulation models of biological systems could be subject to such practices. Yet the fact that a given simulation model returns results that are at variance with experimental data may be attributable to an inadequacy of the underlying model or to an error in programming.35 Note also that public code reviews are impossible if the simulation models are proprietary, as they often are when they are created by firms seeking to obtain competitive advantage in the marketplace.
These points suggest a number of key questions in the development of a model.
How much is given up by looking at simplified versions?
How much poorer, and in what ways poorer, is a simplified model in its ability to describe the system?
Are there other, new ways of simplifying and extracting salient features?
Once the simplified representation is understood, how can the details originally left out be reincorporated into a model of higher fidelity?
Finally, another approach to model evaluation is based on notions of logical consistency. This approach uses program verification tools originally developed by computer scientists to determine whether a given program is consistent with a given formal specification or property. In the biological context, these tools are used to check the consistency and completeness of a model's description of the biological system's processes. These descriptions are dynamic and thus permit “running” a model to observe developments in time. Specifically, Kam et al. have demonstrated this approach using the languages, methods, and tools of scenario-based reactive system design and applied it to modeling the well-characterized process of cell fate acquisition during Caenorhabditis elegans vulval development. (Box 5.4 describes the intellectual approach in more detail.36)
Formal Modeling of Caenorhabditis elegans Development. Our understanding of biology has become sufficiently complex that it is increasingly difficult to integrate all the relevant facts using abstract reasoning alone. [Formal modeling presents] a novel (more...)
5.4. MODELING AND SIMULATION IN ACTION
The preceding discussion has been highly abstract. This section provides some illustrations of how modeling and simulation have value across a variety of subfields in biology. No claim is made to comprehensiveness, but the committee wishes to illustrate the utility of modeling and simulations at levels of organization from gene to ecosystem.
5.4.1. Molecular and Structural Biology
220.127.116.11. Predicting Complex Protein Structures
Interactions between proteins are crucial to the functioning of all cells. While there is much experimental information being gathered regarding protein structures, many interactions are not fully understood and have to be modeled computationally. The topic of computational prediction of protein-protein structure remains to be solved and is one of the most active areas of research in bioinformatics and structural biology.
ZDOCK and RDOCK are two computer programs that address this problem, also known as protein docking.37 ZDOCK is an initial stage protein docking program that performs a full search of the relative orientations of two molecules (referred to by convention as the ligand and receptor) to determine their best fit based on surface complementarity, electrostatics and desolvation. The efficiency of the algorithm is enhanced by discretizing the molecules onto a grid and performing a fast Fourier transform (FFT) to quickly explore the translational degrees of freedom.
RDOCK takes as input the ZDOCK predictions and improves them using two steps. The first step is to improve the energetics of the prediction and remove clashes by performing small movements of the predicted complex, using a program known as CHARMM. The second step is to rescore these minimized predictions with more detailed scoring functions for electrostatics and desolvation.
The combination of these two algorithms has been tested and verified with a benchmark set of proteins collected for use in testing docking algorithms. Now at version 2.0, this benchmark is publicly available and contains 87 test cases. These test cases cover a breadth of interactions, such as antibody-antigen, and cases involving significant conformational changes.
The ZDOCK-RDOCK programs have consistently performed well in the international docking competition CAPRI (Figure 5.1). Some notable predictions were for the Rotavirus VP6/Fab (50 of 52 contacting residues correctly predicted), and SAG-1/Fab complex (61 of 70 contacts correct), and the cellulosome cohesion-dockerin structure (50 of 55 contacts correct). In the first two cases, the number of contacts in the ZDOCK-RDOCK predictions were the highest among all participating groups.
The ZDOCK/RDOCK prediction for dockerin (in red) superposed on the crystal structure for CAPRI Target 13, cohesin/dockerin. SOURCE: Courtesy of Brian Pierce and Zhiping Weng, Boston University.
18.104.22.168. A Method to Discern a Functional Class of Proteins
The DNA-binding helix-turn-helix structural motif plays an essential role in a variety of cellular pathways that include transcription, DNA recombination and repair, and DNA replication. Current methods for identifying the motif rely on amino acid sequence, but since members of the motif belong to different sequence families that have no sequence homology to each other, these methods have been unable to identify all motif members.
A new method based on three-dimensional structure was created that involved the following steps:38 (1) choosing a conserved component of the motif, (2) measuring structural features relative to that component, and (3) creating classification models by comparing measurements of structures known to contain the motif to measurements of structures known not to contain the motif. In this case, the conserved component chosen was the recognition helix (i.e., the alpha helix that makes sequence-specific contact with DNA), and two types of relevant measurements were the hydrophobic area of interaction between secondary structure elements (SSEs) and the relative solvent accessibility of SSEs.
With a classification model created, the entire Protein Data Bank of experimentally measured structures was searched and new examples of the motif were found that have no detected sequence homology with previously known examples. Two such examples are Esa1 histone acetyltransferase and isoflavone 4-O-methyltransferase. The result emphasizes an important utility of the approach: sequence-based methods used to discern a functional class of proteins may be supplemented through the use of a classification model based on three-dimensional structural information.
22.214.171.124. Molecular Docking
Using a simple, uniform representation of molecular surfaces that requires minimal parameterization, Jain39 has constructed functions that are effective for scoring protein-ligand interactions, quantitatively comparing small molecules, and making comparisons of proteins in a manner that does not depend on protein backbone. These methods rely on computational approaches that are rooted in understanding the physics of molecular interactions, but whose functional forms do not resemble those used in physics-based approaches. That is, this problem can be treated as a pure computer science problem that can be solved using combinations of scoring and search or optimization techniques parameterized with the use of domain knowledge. The approach is as follows:
Molecules are approximated as collections of spheres with fixed radii: H = 1.2; C = 1.6; N = 1.5; O = 1.4; S = 1.95; P = 1.9; F = 1.35; Cl = 1.8; Br = 1.95; I = 2.15.
A labeling of the features of polar atoms is superimposed on the molecular representation: polarity, charge, and directional preference (Figure 5.2, subfigures A and B).
A scoring function is derived that, given a protein and a ligand in some relative alignment, yields a prediction of the energy of interaction.
The function is parameterized in terms of the pairwise distances between molecular surfaces.
The dominant terms are a hydrophobic term that characterizes interactions between nonpolar atoms and a polar term that captures complementary polar contacts with proper directionality.
The parameters of the function were derived from empirical binding data and 34 protein-ligand complexes that were experimentally determined.
The scoring function is described in Figure 5.2, Subfigure C. The hydrophobic term peaks at approximately 0.1 unit with a slight surface interpenetration. The hydrophobic term for an ideal hydrogen bond peaks at 1.25 units, and a charged interaction (tertiary amine proton (+1.0) to a charged carboxylate (−0.5)) peaks at about 2.3 units. Note that this scoring function looks nothing like a force field derived from molecular mechanics.
Figure 5.2, Subfigure D compares eight docking methods on screening efficiency using thymidine kinase as a docking target. For the test, 10 known ligands and 990 random ligands were used. Particularly at low false-positive rates (low database coverage), the scoring function approach shows substantial improvements over the other methods.
A Computational Approach to Molecular Docking. SOURCE: Courtesy of A.N. Jain, University of California, San Francisco.
126.96.36.199. Computational Analysis and Recognition of Functional and Structural Sites in Protein Structures40
Structural genomics initiatives are producing a great increase in protein three-dimensional structures determined by X-ray and nuclear magnetic resonance technologies as well as those predicted by computational methods. A critical next step is to study the relationships between protein structures and functions. Studying structures individually entails the danger of identifying idiosyncratic rather than conserved features and the risk of missing important relationships that would be revealed by statistically pooling relevant data. The expected surfeit of protein structures provides an opportunity to develop computational methods for collectively examining multiple biological structures and extracting key biophysical and biochemical features, as well as methods for automatically recognizing these features in new protein structures.
Wei and Altman have developed an automated system known as FEATURE that statistically studies the important functional and structural sites in protein structures such as active sites, binding sites, disulfide bonding sites, and so forth. FEATURE collects all known examples of a type of site from the Protein Data Bank (PDB) as well as a number of control “nonsite” examples. For each of them, FEATURE computes the spatial distributions of a large set of defined biophysical and biochemical properties spanning multiple levels of details in order to capture conserved features beyond basic amino acid sequence similarity. It then uses a nonparametric statistical test, the Wilcoxin Rank Sum Test, to find the features that are characteristic of the sites, in the context of control nonsites. Figure 5.3 shows the statistical features of calcium binding sites.
Statistical features of calcium binding sites determined by FEATURE. The volumes in this case correspond to concentrate radial shells 1 Å in thickness around the calcium ion or a control nonsite location. The column shows properties that are statistically (more...)
By using a Bayesian scoring function that recognizes whether a local region within a three-dimensional structure is likely to be any of the sites and a scanning procedure that searches the whole structure for the sites, FEATURE can also provide an initial annotation of new protein structures. FEATURE has been shown to have good sensitivity and specificity in recognizing a diverse set of site types, including active sites, binding sites, and structural sites and is especially useful when the sites do not have conserved residues or residue geometry. Figure 5.4 shows the result of searching for ATP (adenosine triphosphate) binding sites in a protein structure.
Results of automatic scanning for ATP binding sites in the structure of casein kinase (PDB ID 1csn) using WebFEATURE, a freely available, Web-based server of FEATURE. The solid red dots show the prediction of FEATURE, they correspond correctly with the (more...)
5.4.2. Cell Biology and Physiology
188.8.131.52. Cellular Modeling and Simulation Efforts
Cellular simulation requires a theoretical framework for analyzing the interactions of molecular components, of modules made up of those components, and of systems in which such modules are linked to carry out a variety of functions. The theoretical goal is to quantitatively organize, analyze, and interpret complex data on cell biological processes, and experiments provide images, biochemical and electrophysiological data on the initial concentrations, kinetic rates, and transport properties of the molecules and cellular structures that are presumed to be the key components of a cellular event.41 A simulation embeds the relevant rate laws and rate constants for the biochemical transformations being modeled. Based on these laws and parameters, the model accepts as initial conditions the initial concentrations, diffusion coefficients, and locations of all molecules implicated in the transformation, and generates predictions for the concentration of all molecular species as a function of time and space. These predictions are compared against experiment, and the differences between prediction and experiment are used to further refine the model. If the system is perturbed by the addition of a ligand, electrical stimulus, or other experimental intervention, the model should be capable of predicting changes as well in the relevant spatiotemporal distributions of the molecules involved.
There are many different tools for simulating and analyzing models of cellular systems (Table 5.1). More general tools, such as Mathematica and MATLAB or other systems that can be used for solving systems of differential or stochastic-differential equations, can be used to develop simulations, and because these tools are commonly used by many researchers, their use facilitates the transfer of models among different researchers. Another approach is to link data gathering and biological information systems to software that can integrate and predict behavior of interacting components (currently, researchers are far from this goal, but see Box 5.5 and Box 5.6). Finally, several platform-independent model specification languages are under development that will facilitate greater sharing and interoperability. For example, SBML,42 Gepasi,43 and CellML44 are specialized systems for biological and biochemical modeling. Madonna45 is a general-purpose system for solving a variety of equations (differential equations, integral equations, and so on).
BioSPICE. BioSPICE, the Biological Simulation Program for Intra-Cellular Evaluation, is in essence a modeling framework that provides users with model components, tools, databases, and infrastructure to develop predictive dynamical models of cellular (more...)
Cytoscape. A variety of computer-aided models has been developed to simulate biological networks, typically focusing on specific cellular processes or single pathways. Cytoscape is a modeling environment particularly suited to the analysis of global data (more...)
Rice and Stolovitzky describe the task of inferring signaling, metabolic, or gene regulatory pathways from experimental data as one of reverse engineering.46 They note that automated, high-throughput methods that collect species- and tissue-specific datasets in large volume can help to deal with the risks in generalizing signaling pathways from one organism to another. At the same time, fully detailed kinetic models of intracellular processes are not generally feasible. Thus, one step is to consider models that describe network topology (i.e., that identify the interactions between nodes in the system—genes, proteins, metabolites, and so on). A model with more detail would describe network topology that is causally directional (i.e., that specifies which entities serve as input to others). Box 5.7 provides more detail.
Pathway Reconstruction: A Systems Approach. On Topology. In this level, we are only concerned with identifying the interaction between nodes (genes, proteins, metabolites, etc.) in the system. The goal is the generation of a diagram of non-directional (more...)
An example of a cellular simulation environment is E-CELL, an open-source system for modeling biochemical and genetic processes. Organizationally, E-CELL is an international research project aimed at developing theoretical and functioning technologies to allow precise “whole cell” simulation; it is supported by the New Energy and Industrial Technology Development Organization (NEDO) of Japan.
E-CELL simulations allow a user to model hypothetical virtual cells by defining functions of proteins, protein-protein interactions, protein-DNA interactions, regulation of gene expression, and other features of cellular metabolism.47 Based on reaction rules that are known through experiment and assumed concentrations of various molecules in various locations, E-CELL numerically integrates differential equations implicitly described in these reaction rules, resulting in changes over time in the concentrations of proteins, protein complexes, and other chemical compounds in the cell.
Developers hope E-CELL will ultimately allow investigators a cheap, fast way to screen drug candidates, study the effects of mutations or toxins, or simply probe the networks that govern cell behavior. One application of E-CELL has been to construct a model of a hypothetical cell capable of transcription, translation, energy production, and phospholipid synthesis with only 127 genes. Most of these genes were taken from Mycoplasma genitalium, the organism with the smallest known chromosome (the complete genome sequence is 580 kilobases).48 E-CELL has also been used to construct a computer model of the human erythrocyte,49 to estimate a gene regulatory network and signaling pathway involved in the circadian rhythm of Synechococcus sp. PCC 7942,50 and to model mitochondrial energy metabolism and metabolic pathways in rice.51
Another cellular simulation environment is the Virtual Cell, developed at the University of Connecticut Health Center.52 The Virtual Cell is a tool for experimentalists and theoreticians for computationally testing hypotheses and models. To address a particular question, these mechanisms (chemical kinetics, membrane fluxes and reactions, ionic currents, and diffusion) are combined with a specific set of experimental conditions (geometry, spatial scale, time scale, stimuli) and applicable conservation laws to specify a concrete system of differential and algebraic equations. This experimental geometry may assume well-mixed compartments or a one-, two-, or three-dimensional spatial representation (e.g., experimental images from a microscope). Models are constructed from biochemical and electrophysiological data mapped to appropriate subcellular locations in images obtained from a microscope. A variety of modeling approximations are available including pseudo-steady state in time (infinite kinetic rates) or space (infinite diffusion or conductivity). In the case of spatial simulations, the results are mapped back to experimental images and can be analyzed by applying the arsenal of image-processing tools that is familiar to a cell biologist. Section 184.108.40.206 describes a study undertaken within the Virtual Cell framework.
Simulation models can be useful for many purposes. One important use is to facilitate an understanding of what design properties of an intracellular network are necessary for its function. For example, von Dassow et al.53 used a simulation model of the gap and pair-rule gene network in Drosophila melanogaster to show that the structure of the network is sufficient to explain a great deal of the observed cellular patterning. In addition, they showed that the network behavior was robust to parameter variation upon the addition of hypothetical (but reasonable) elements to the known network. Thus, simulations can also be used to formally propose and justify new hypothetical mechanisms and predict new network elements.
Another use of simulation models is in exploring the nature of control in networks. An example of exploring network control with simulation is the work of Chen et al.54 in elucidating the control of different phases of mitosis and explaining the impact of 50 different mutants on cellular decisions related to mitosis.
Simulations have also been used to model metabolic pathways. For example, Edwards and Palsson developed a constraint-based genome-scale simulation of Escherichia coli metabolism (Box 5.8