By Thomas R. Collingwood, Ph.D., President, Fitness Intervention Technologies, Richardson, Texas; Robert Hoffman, Program Coordinator, Thomas and Means LLP, Huntersville, North Carolina; and Jay Smith, President, Integrated Fitness Systems and FitForce, Salem, Massachusetts
Few if any law enforcement personnel disagree with the notion that physical fitness is necessary for the safe and effective performance of certain critical and essential job functions. The more difficult question is, how fit do officers need to be? There is even more confusion as to how traditional measures of physical fitness, such as push-ups and sit-ups, can be underlying and predictive factors for the performance of those essential law enforcement job tasks.
For the last 30 years, the authors of this article have been actively involved in establishing physical fitness programs and standards in hundreds of municipal, state, and federal law enforcement agencies. Those agencies ask, how can we prove that being physically fit is job related? The confusion exists due to several issues:
The Uniform Guidelines for Employee Selection Tests1 clearly require that, to be valid, physical fitness tests, standards, and programs must be job related and consistent with business necessity. Without data to document that job-relatedness, case law indicates that physical fitness tests, standards, and programs are at risk. Tests and standards must be significantly correlated with, and predictive of, performing essential functions of the job.
- Practical concerns such as what physical training programs will develop the fitness required for the job and help prevent injuries
- Legal concerns such as disparate impact, age, and disabled discrimination
Using data collected in the last 15 years, it is now possible to document that fitness areas such as aerobic and anaerobic power, strength, flexibility, explosive power, and agility underlie specific task performance. This analysis presents conclusions supported by data collected from 34 physical performance standards validation studies performed on more than 5,500 incumbent officers representing 75 federal, state, and local law enforcement agencies. The officer samples from each agency were stratified by age and gender, and they were randomly selected. Consequently, the data are reflective of the demographic characteristics of each agency. Given the size of the sample, we suggest that the results can be generalized as being applicable to law enforcement officers in general.
Standardized Validation Study Method
The 34 studies were all construct and criterion validation studies. They were designed to assess the accuracy of a physical fitness test as a predictor of an officer's ability to perform physical job tasks (the criterion). Construct and criterion validation are two of the three acceptable methods that the Uniform Guidelines accept as proof for a job-related test and standard. Each study followed the same basic procedures to determine the physical tasks of the job and to identify which physical fitness areas predicted safe and effective performance of those physical tasks. The steps for each study were as follows:
Researchers evaluated selected fitness test scores to see which cut points predicted most accurately who did and who did not perform the job-task scenarios at an acceptable level. The result of each validation study was as follows:
- Researchers reviewed records such as job descriptions, injury reports, and use-of-force reports to identify critical physical tasks.
- A stratified random sample of officers completed a job task analysis to determine frequent and critical tasks.
- Supervisors incorporated the critical and frequent physical tasks identified in steps 1 and 2 into job-task simulation scenarios. Since job tasks are seldom performed in isolation, we chose to sequence them in real-world scenarios. These scenarios became the criterion measures against which the predictability of the fitness tests would be assessed.
- We hypothesized that certain physical fitness areas were the underlying and predictive factors for performing the job tasks. Accepted field tests measuring those fitness areas were incorporated into a physical fitness battery.
- Stratified random samples of incumbent officers completed both the job-task simulation scenarios and the fitness test battery.
- Researchers statistically analyzed the results to determine which fitness tests were related to each job-task simulation test and to ascertain the strength of those relationships. Univariate (correlations) and multivariate (multiple regression) statistics were applied.
- Subject matter experts (SMEs) and sample statistics defined the minimally acceptable level of performance on the job-task simulation scenarios.
Analysis of Validation Study Results
- The definition of fitness tests with strong predictive relationships to performance on job-task simulation scenarios
- The fitness scores on each test that accurately predicted who could and who could not perform the job-task simulation scenarios at a minimally acceptable level
The results of the validation studies provide data that suggest which fitness areas are underlying and predictive of safe and effective performance of law enforcement physical tasks.
Physical task ratings to define frequent and critical physical tasks: Incumbent officers from the various studies tended to rate the frequency and criticality of physical tasks similarly. Criticality ratings had between 85 percent and 100 percent agreement, while the frequency of tasks ratings had between 50 percent and 92 percent agreement. Incumbent officers consistently rated the tasks listed in figure 1 as the most critical and frequent.
Job task simulation scenarios as the criterion test measures: In each study, the agency's SMEs reviewed the job-task analysis (JTA) data and developed job-task simulation scenarios. The quantifying data, such as distances, heights, weights, and widths, were all based on JTA data with SME agreement on the final scenario parameters. Each study's SMEs demonstrated considerable agreement as to what were the most critical or frequent physical tasks and developed similar job-task simulation scenarios. As a consequence, the data from the 34 studies can be compared across studies.
In general, the critical and frequent tasks were operationalized into three basic events containing the specific tasks:
Since the job-task simulation tests served as the criterion measures, it was important that they have content validity and be truly reflective of the real physical tasks of the job. Otherwise, the fitness test predictability results would not be valid indicators of job-relatedness. To assure that the job-task simulation tests were realistic representations of what officers must do on the job, agency SMEs and officers selected for the testing evaluated the realism of each scenario upon completion. Approximately 95 percent of participating officers rated each scenario as being either a situation they have personally performed or would be expected to perform. These officer ratings, along with the job-task analysis data, provided concurrent validation that the scenarios are representative of the physical tasks officers must perform, and, as a result, the job-task simulation tests have content validity.
- Roadway clearance, involving lifting, carrying, and dragging debris, and pushing a car
- Victim extraction, involving sprinting to a disabled vehicle and lifting and dragging a dummy to safety
- Sustained foot pursuit, involving running up stairs, dodging, jumping, climbing a fence, crawling, vaulting obstacles, striking and moving a dummy, and simulated cuffing using resistance bands
Physical fitness ratings to define underlying fitness factors: The job-task analysis surveys required incumbents to rate the importance of 10 fitness factors (see figure 2). There was between 90 and 100 percent agreement among the officers in the different studies as to what were the important physical fitness factors for performing the job.
Physical Fitness Tests
The tests in the physical fitness battery measured the fitness factors the incumbent officers rated as important and necessary to perform the job. We used the same fitness battery in the majority of the studies. In some studies, we made changes due to logistical difficulties, agency requests, or legal concerns. For example, the leg press was used in only 25 percent of the studies. We dropped it from the battery because few agencies had access to that specific equipment. In its place, the vertical jump, which has a leg strength component, appeared to be more predictive and is much less of a logistic challenge to conduct.
We conducted body composition assessments in only 16 percent of the studies. Some agencies did not want it included due to estimation inaccuracies. Body composition estimates were not used in any of the analyses because they are not performance assessments but rather static indicators of health.
Agility is a motor skill, not a component of fitness. Therefore, we did not include an assessment of agility in our earlier studies. We soon recognized, however, that agility is underlying and predictive of the ability to perform certain essential physical functions. We began administering the Illinois agility run in subsequent studies, meaning that 67 percent of our studies included that test.
Likewise, handgrip strength is not a component of fitness. We only included a test for handgrip strength when asked to do so by the agency. In the 6 percent of the studies that included that test, we learned that the area it measures is not important to the performance of essential physical functions.
All of the fitness tests selected are accepted within the field as valid measures of the fitness areas being tested. The fitness test batteries in all studies consisted of some combination of the physical factors and tests contained in figure 2).
Analysis of Fitness Tests' Predictability
Determining the fitness areas that are the underlying and predictive factors for performing essential physical tasks required two basic analyses. A correlational and regression statistical analysis documented the strength of each physical fitness area (as measured by a physical fitness test) as an underlying factor for performing the physical job tasks (as measured by the job-task simulation tests). Tests must demonstrate a predetermined correlation in order to be valid predictors. A specificity and sensitivity analysis determined how well each fitness test score predicted those officers who could and could not perform the job-task simulation tests at an effective level. This analysis determines which fitness test scores are used as standards. What follows is a brief explanation of the statistical procedures we employed.
Correlation: A Pearson Product Moment Correlation Coefficient (r) is a statistic that displays the strength of a relationship between two variables. It is expressed as a number that ranges between +1.00 and -1.00. The closer the r is to either +1.00 or -1.00, the stronger the implication that one factor is predictive of the other. Negative correlations indicate an inverse relationship. For example, a faster time (lower number) on the 1.5-mile run indicates a better level of cardiovascular fitness (higher number). Correlations do not imply direct causation but do imply a strong enough relationship so that some level of predictability exists. Let's assume that the push-up had an r equal to -.61 with the roadway clearance scenario. That tells us that there is a strong relationship between the ability to do push-ups and clearing the roadway more quickly.
Regression: Multivariate analyses are statistical procedures to clarify the underlying structure of many variables. This type of analysis is especially useful for demonstrating validity because it evaluates relationships among a group of fitness tests, rather than individual fitness tests and the job-task simulation tests. If the criterion test represents the ability to do the job, and the regression analysis indicates that a group of test items predict the ability to perform the job-task simulation tests, it follows that the fitness tests predict the ability to do the job. If a fitness test emerges as a significant factor in a regression analysis, that fact further supports the theory that the test is an underlying and predictive factor.
Specificity and sensitivity: These two terms reflect how accurately a score on a particular test predicts performance. The value of any fitness test cut point depends on how well it correctly identifies which individuals have an ability and how well it controls for the measurement error associated with any test. Specificity and sensitivity are defined as follows:
For the 34 validation studies, we required a minimum of 70 percent for both specificity and sensitivity. That means for a fitness test score to become a standard, it had to predict with at least 70 percent accuracy which officers could perform the job-task simulation tests at the effective level and who could not. Having both 70 percent specificity and sensitivity results in a standard that is highly predictive and, as such, is acceptable as being job related.
- Specificity: the percentage of individuals who fail the fitness test and also fail the job-task simulation tests
- Sensitivity: the percentage of individuals who pass the fitness test and also pass the job-task simulation tests
For example, let's say we found 25 push-ups to have a specificity level of 82 percent with the minimum effective time on the roadway clearance. That tells us that an officer who can't perform at least 25 push-ups is at least 82 percent likely to be unable to perform the roadway clearance effectively. Said another way, 82 percent of the officers in the tested sample who did fewer than 25 push-ups also failed to complete the roadway clearance in the minimally effective time.
Using the same push-up example, let's say the sensitivity rating is 84 percent. We would know that officers who can do 25 or more push-ups have 84 percent assurance of effectively clearing the roadway.
- Statistical significance: This is a term relating to the degree of confidence one can have that the results obtained are not due to chance but are due to a true relationship. Specific statistical procedures are applied to test for the significance of any finding. Usually the .05 level is accepted as the lowest level of confidence of a true finding. It means that the probability of the results being due to chance is five out of a 100. A .01 level is one out of 100, and .001 is one out of a 1000. How high the correlation must be to be significant depends on the size of the sample. For example, with a large enough number of tested individuals, it is possible to obtain a statistically significant correlation at the .05 level between two factors with an r of only .19. In our studies, we usually required a correlation of at least r = .50 to suggest a moderately high relationship.
Criteria for Evaluating the Job-Relatedness of Each Physical Fitness Test
Now comes the answer to the most important question: Which physical fitness tests appear to be the most underlying and predictive factors for the performance of the essential job tasks? We applied two sets of criteria to each validation study's results to determine the job relatedness of each fitness test.
Criteria for a given fitness test measuring an underlying factor for performing job tasks: Almost all the physical fitness tests had large percentages of statistically significant correlations to scenario scores. In order to identify the tests with the strongest relationships with a given scenario, we applied three criteria to each fitness test:
Criterion for a given fitness test to be a predictive factor for performing job tasks: We analyzed the various fitness test cut points to determine how well each score predicted effective performance of the essential physical tasks, i.e., job-task simulation scenarios. As noted above, in the 34 validation studies, we only considered scores for use as standards if they accurately predicted at least 70 percent of the officers who could both pass the fitness test and the job-task simulation scenario and at least 70 percent of those officers who failed both tests. A fitness test had to have a predictive cut point that met that 70/70 criterion.
- A fitness test had to have an average correlation of at least .50 with a given scenario across all studies.
- A fitness tests had to have at least 50 percent of the significant correlations to be over r = .50 between the test and a given scenario across all studies.
- A fitness test had to be a significant factor in at least 50 percent of the regressions in all of the validation studies.
Results: The Underlying and Predictive Physical Fitness Factors and Tests
Tests were accepted or rejected as being job related based on the number of criteria met for each scenario. For a fitness test to be a primary factor, it had to meet all four criteria. To be classified as a secondary factor, it must have met three of the four criteria for each scenario. Using those guidelines, the underlying and predictive physical fitness tests are as follows:
Scenario 1. Lifting, carrying, dragging, pushing tasks
Scenario 2. Lifting, dragging, and extracting tasks
- Primary factors meeting all four criteria
- Absolute upper-body relative strength (one-repetition maximum (1RM) bench press ratio)
- Agility (Illinois agility test)
- Secondary factors meeting three of four criteria
- Anaerobic power (300-meter run)
- Absolute upper body strength (1RM bench press raw)
- Explosive leg power (vertical jump)
Scenario 3. Pursuit involving running up stairs, running sustained pursuit, dodging, jumping, climbing a fence, crawling, vaulting obstacles tasks
- Primary factors meeting all four criteria
- Absolute upper-body strength (1RM bench press raw)
- Agility (Illinois agility test)
- Secondary factor meeting three of four criteria
- Absolute upper-body relative strength (1RM bench press ratio)
Factors to Perform Essential Physical Tasks
- Primary factors meeting all four criteria
- Aerobic power (1.5-mile run)
- Anaerobic power (300-meter run)
- Secondary factors meeting three of four criteria
- Upper-body muscular endurance (push-up)
- Abdominal muscular endurance (sit-up)
The data obtained from the 34 physical fitness standard validation studies indicate that certain physical fitness areas are the underlying and predictive factors or physical abilities that determine a law enforcement officer's capabilities to perform essential physical tasks. Those factors are as follows:
- Aerobic power as measured by the 1.5-mile run
- Anaerobic power as measured by the 300-meter run
- Upper-body absolute strength as measured by the 1RM bench press
- Upper-body muscular endurance as measured by the push-up test
- Abdominal muscular endurance as measured by the one-minute sit-up test
- Explosive leg power as measured by the vertical jump
- Agility as measured by the Illinois agility run
The implications of these findings are straightforward:
- Test for these areas to ensure that applicants, academy recruits, and incumbents have the physical abilities to perform the essential physical tasks of the job.
- Develop performance standards in these areas for utilization with applicants, academy recruits, and incumbents.
- Provide training programs that ensure that law enforcement recruits and incumbents have the skills and knowledge to maintain personal conditioning programs throughout their career.
1 Equal Employment Opportunity Commission, Uniform Guidelines for Employee Selection Tests (Washington, D.C.: U.S. Government Printing Office, 1978), available at www.access.gpo.gov/nara/cfr/waisidx_00/29cfr1607_00.html . Revised guidelines (as of July 1, 2000) have been incorporated in the Code of Federal Regulations at Title 41, Volumes 1 to 100.
Collingwood, Thomas R., Robert Hoffman, and Jay Smith. "The Need for Physical Fitness." Law and Order (June 2003): 44-50.
Collingwood, Thomas R., and H. Kohl. "Application of Specificity/Sensitivity Analysis to Define Physical Performance Standards for Public Safety Officers: Abstract." Medicine and Science in Sport and Exercise 26 (1994): 18.
Collingwood, Thomas R., H. Kohl, and R. Reynolds. "Application of Disease Screening Principles to Define Physical Performance Standards for Police Officers: Abstract." Medicine and Science in Exercise and Sport 24 (1992): 133.