Are Self-report Measures Able to Define Individuals as Physically Active or Inactive?

PURPOSE
Assess the agreement between commonly used self-report methods compared with objectively measured physical activity (PA) in defining the prevalence of individuals compliant with PA recommendations.


METHODS
Time spent in moderate and vigorous PA (MVPA) was measured at two time points in 1713 healthy individuals from nine European countries using individually calibrated combined heart rate and movement sensing. Participants also completed the Recent Physical Activity Questionnaire (RPAQ), short form of the International Physical Activity Questionnaire (IPAQ), and short European Prospective Investigation into Cancer and Nutrition Physical Activity Questionnaire (EPIC-PAQ). Individuals were categorized as active (e.g., reporting ≥150 min of MVPA per week) or inactive, based on the information derived from the different measures. Sensitivity and specificity analyses and Kappa statistics were performed to evaluate the ability of the three PA questionnaires to correctly categorize individuals as active or inactive.


RESULTS
Prevalence estimates of being sufficiently active varied significantly (P for all <0.001) between self-report measures (IPAQ 84.2% [95% confidence interval {CI}, 82.5-85.9], RPAQ 87.6% [95% CI, 85.9-89.1], EPIC-PAQ 39.9% [95% CI, 37.5-42.1] and objective measure 48.5% [95% CI, 41.6-50.9]. All self-report methods showed low or moderate sensitivity (IPAQ 20.0%, RPAQ 18.7%, and EPIC-PAQ 69.8%) to correctly classify inactive people and the agreement between objective and self-reported PA was low (ĸ = 0.07 [95% CI, 0.02-0.12], 0.12 [95% CI, 0.06-0.18], and 0.19 [95% CI, 0.13-0.24] for IPAQ, RPAQ, and EPIC-PAQ, respectively).


CONCLUSIONS
The modest agreement between self-reported and objectively measured PA suggests that population levels of PA derived from self-report should be interpreted cautiously. Implementation of objective measures in large-scale cohort studies and surveillance systems is recommended.

P hysical activity (PA) is one of the leading risk factors for noncommunicable diseases, and it has been suggested that physical inactivity is one of the greatest public health problem in the 21st century (37). Thus, increasing PA has been proposed as an important public health strategy. Many health authorities worldwide recommend that adults should engage in moderate-to-vigorous PA (MVPA) for at least 150 minIwk j1 (9,26,38). Although total PA energy expenditure (PAEE) and time spent in different intensities are important dimensions when measuring PA in relation to health outcomes, accurate measures of the proportion of the population meeting above-mentioned recommendations is fundamental for public health policy and informing intervention strategies.
For practical reasons, PA questionnaires (PAQs) are the most commonly used assessment method in large population-based cohort studies and surveillance systems. Based on data from 122 countries, 31.1% of adults worldwide were estimated to be inactive (e.g., not meeting PA recommendations), with substantial between-country variation (19), and data from Europe (5) reveal large variation in population levels of sitting even across industrialized countries, with levels ranging from 191 to 407 minId j1 . However, there is a paucity of data documenting the agreement between different PAQs for correctly classifying individuals as physically active (e.g., meeting the adult PA recommendations of greater than 150 min of MVPA per week) compared with an objective criterion method.
Frequently used PAQs include the International Physical Activity Questionnaire (IPAQ) (11), which has been used in large scale population surveys such as the Eurobarometer (27) and the World Health Organization (WHO) world health survey (17); the Recent Physical Activity Questionnaire (RPAQ), which is used in the Fenland study and National Diet European Prospective Investigation into Cancer and Nutrition Survey in the UK (6,13); and the short EPIC Physical Activity Questionnaire (EPIC-PAQ) (36) used in one of the largest pan-European cohorts including approximately 520,000 individuals (29). These three PAQs have been extensively validated, and the overall results reveal validity for ranking individuals and group level assessment of PAEE, MVPA, and sedentary time (11,16,23). However, reported associations with criterion measures rarely exceed correlations of 0.3 (28) and, although significant, should be interpreted as low to moderate. In addition, there are several well-known limitations with self-report, especially with regard to cultural differences, recall bias, and misinterpretation of questions (10). Overreporting of PA seems to be a measurement issue with respect to IPAQ (15,25,31), and a recent validation of RPAQ revealed higher estimates of PAEE and MVPA compared with an objective criterion measure (16). Taken together, this might result in an overestimation of the proportion of respondents being categorized as sufficiently physically active when PA is assessed with self-report.
Accurate data on population levels of PA are required for policymakers and researchers to be able to answer fundamental public health questions by means of exploring trends in PA behavior both within and across countries, to evaluate the effect of different initiatives, and to be able to reach target populations. Thus, based on the large number of data available in European adults using different PAQs, the main aim of the current study was to assess the agreement between different commonly used self-report methods and compared with objectively measured PA (criterion method) in defining the prevalence of individuals compliant with PA recommendations.

METHODS
Details of the study population and study design have been published elsewhere (16,23). In short, a convenient sample of healthy individuals from 10 European countries was recruited based on a center-specific age and sex distribution similar to the original EPIC-Europe cohort (29).  [n = 196]). In France and Norway, only women were included. In addition, only those individuals comprising complete data on objective measured PA and all three PAQs are included in the analyses. In Umeå (Sweden), the original EPIC-PAQ was not used, and consequently, all participants from Sweden were excluded from the present analyses. The final study population therefore included 1713 participants. The study consisted of two visits held 4 to 5 months apart (mean time between visits = 4.5 months; SD = 1.0). Height, weight, and free-living PA were measured at both visits according to standard procedures (23) with the additional administration of IPAQ, RPAQ, and EPIC-PAQ at the second visit. For standardization and quality control across centres, the MRC Epidemiology Unit staff organized a workshop before testing and visited study centers during the testing phase. Each center obtained ethical approval from a local ethics board before participant recruitment, and informed consent was obtained from all participants.

Objective PA Measurement
Physical activity was objectively measured using a combined heart rate (HR) and movement sensor (Actiheart, CamNtech Ltd, Cambridge, UK) attached to the chest via standard ECG electrodes. All participants performed an 8-min submaximal ramped step test (200-mm step; Reebok, Lancaster, UK) to determine the individual relationship between HR and workload (8). After the step test, the Actiheart sensor was reinitialized to collect data in 1 min epochs, and the participants were instructed to wear the monitor constantly (24 hId j1 ) for at least four consecutive days with a mean wear time for both measuring periods of 4.7 (1.0) d. We excluded all participants with less than 3 d of wear data. Furthermore, MVPA was averaged based on the mean of each of the 2 four-day measurements.
Physical activity intensity (JImin j1 Ikg j1 ) for each time point was estimated from the combination of movement registration and individually calibrated HR (8) using a branched equation framework (7). To handle potential measurement noise in the HR trace, HR data from free-living was preprocessed (32), and identification of nonwear periods from the combination of nonphysiological HR and prolonged periods (960 min) of inactivity were performed. This method yields quantification of uncertainty (error bars on the estimate of latent HR at any given time), which is heavily influenced by the randomness of the measurement (which is nonphysiological over short time scales), and we simply use the size of this uncertainty in combination with prolonged periods of no movement to decide if it is likely that the person was wearing the monitor or not. The threshold for MVPA was set at greater than 3.5 MET, with 1 MET based on the standard definition (32) of 1 MET = METS values derived for reported time spent walking from IPAQ (3.3 METS) and RPAQ (3.5 METS). For the purpose of defining the individuals as inactive or active, the threshold for meeting PA recommendations was defined as achieving at least 150 minIwk j1 of moderate-to-vigorous activity (93.5 MET). We included all time above this level, without any stipulation of bouts.

PA Questionnaires
The three different PAQs were electronically administered as previously described (6,11,16,23). Before the objective monitoring of PA, each participant completed the short EPIC-PAQ, the short version of the IPAQ, and the RPAQ. All PAQs were translated from English to each of the specific languages and then back-translated before administration (23).
In short, EPIC-PAQ is composed of four questions related to PA during the last year. The four questions cover the following: 1) category of occupational activity, 2) participation in several activities (walking, cycling, do-it-yourself, gardening, sports, and household chores) during both summer and winter, 3) participation in vigorous nonoccupational activity, and 4) number of floors or stair flights climbed per day. With the information on occupational category and time spent in sports and cycling as basis, we derived the Cambridge index (36). To define individuals as active or inactive, we collapsed these categories into either not meeting PA recommendations (inactive and moderately inactive) or meeting PA recommendations (moderately active and active). Participants categorized as inactive reported an inactive occupation and no leisure time PA, and those categorized as moderately inactive reported both a sedentary occupation and less than 3.5 hIwk j1 of moderate-to-vigorous intensity leisure time PA or a standing occupation in combination with no leisure time PA.
Recent Physical Activity Questionnaire consists of nine different questions referring to the last 4 wk. The RPAQ is composed of closed questions and with ordered categories of frequencies paired with duration. The RPAQ covers four domains of PA: domestic life, work, recreation, and transport. The domestic section includes questions about TV viewing, computer use, and stair climbing. The Modified Tecumseh Occupational Activity Questionnaire (1) was adopted for deriving the occupational categories of PA (mostly sitting, standing, manual, or heavy manual). Questions about recreational PA were adopted from The Minnesota Leisure Time Activity Questionnaire (30), and transport related PA was categorized as walking, cycling, and use of car/public transport. To estimate time spent in different intensities, all activities were categorized as follows: sedentary (G1.5 METs), light (1.5 to G3 METs), and MVPA (93 METs). Based on time (minIwk j1 ) spent greater than 3 METs, those individuals achieving at least 150 minIwk j1 of MVPA were categorized as meeting PA recommendations. The majority of recreation activities accessible to report were referring to MET scores above 3.5 METs.
The short, last 7 d, IPAQ asks the respondents to report time (i.e., number of sessions and average time per session) spent walking (3.3 METs) in moderate intensity PA (4.0 METs), vigorous intensity PA (8 METs), and sitting (G1.5 METs) (only weekdays). Questions regarding intensity were supplemented by examples of commonly performed activities. Based on the information within each intensity category, we estimated the total amount of time spent in PA per week. MVPA (minIwk j1 ) was estimated by summing the reported time spent walking (3.3 METs) and in MVPA (94 METs) intensity and then categorized those individuals achieving at least 150 minIwk j1 of MVPA as meeting the PA recommendations.

Statistics
Values in tables are presented as mean T SD, unless otherwise stated. Differences in participants_ age and BMI were assessed by independent samples t test. Based on data from each PAQ, differences were assessed between proportions of individuals meeting PA recommendations using chi-square analyses. Sensitivity and specificity analyses with 95% confidence intervals were performed to evaluate the PAQ_s ability to correctly categorize individuals as active or inactive using combined HR and motion sensing as criterion method. Specificity is the PAQ_s ability to correctly identify an individual as physically active, whereas sensitivity refers to the ability to correctly identify individuals as physically inactive. Kappa statistics was used to evaluate the level of agreement between PAQs and the criterion method in defining prevalence of compliance with recommendations. Random effect meta-analyses were used to calculate the combined agreement across countries. Heterogeneity across countries in the agreement of each PAQ was evaluated using Forest plots and assessed using I-squared (I 2 ) statistics. Kappa correlations coefficients of 0.81 to 1.00 are generally interpreted as very good, 0.61 to 0.80 as good, 0.41 to 0.60 as moderate, 0.21 to 0.40 as fair, and less than 0.20 as poor (2). Finally, Pearson_s correlation coefficient (r) was used to evaluate the relationship (on country level) of the proportion meeting the PA recommendations from each of the three self-report instruments with the objective criterion measure.
All analyses were performed in IBM SPSS statistics version 21 except for the random effect meta-analyses, which was performed using STATA version 13.0 (StataCorp, College Station, TX). Threshold for significance was set at P G 0.05.

RESULTS
The baseline characteristics stratified by study location are shown in Table 1. Across all locations, the average age was 54.7 (SD 9.5) yr, and the average BMI was 25.7 (SD 4.0) kgIm j2 . The majority of the study population were women (72%), and they were younger (P = 0.009) and leaner (P G 0.001) compared with men. Mean age varied across countries ranging from 48.4 to 61.5 yr. BMI also varied across countries, with Greek women and men having the highest BMI (27.0 and 27.7 kgIm j2 , respectively), whereas the women and men from the Netherlands had the lowest BMI (22.6 and 23.5 kgIm j2 , respectively). Table 2 displays the prevalence of of participants meeting PA recommendations according to different PAQs and the objective measure by country, sex, and overall. The results reveal substantial discrepancies in prevalence estimates from the three PAQs. Overall estimations based on IPAQ and RPAQ suggested that more than four of five participants were categorized as sufficiently active, whereas the proportion of participants categorized as sufficiently active was 39.9% when estimated from the short EPIC-PAQ and 48.5% based on the objective measure. There were no sex differences in the proportions meeting PA recommendations based on IPAQ (P = 0.991) and objectively measured PA (P = 0.098), whereas a lower proportion of women were meeting PA recommendations based on RPAQ (P = 0.015) and EPIC-PAQ (P = 0.003). A similar pattern was found within all countries revealing that the prevalence of being categorized as active was highest when based on results from RPAQ and IPAQ, whereas prevalence estimates were consistently lower when based on EPIC-PAQ or the objective measure. However, comparing prevalence estimates within each PA measure between countries, the results revealed significant differences (P for all G 0.001), with prevalence of meeting PA recommendations based on EPIC-PAQ ranging from 17.6% in Greece to 54% in Norway. For both RPAQ and IPAQ, respectively, participants from Greece had the lowest prevalence (74.2% and 67.0%), whereas the Netherlands showed the highest prevalence (99.0% and 94.7%). Finally, based on the objective measure, 24.5% of UK participants and 65.5% of the participants from the Netherlands where categorized as meeting PA recommendations. Figure 1A to Using objectively measured PA as the criterion method, we found good specificity but poor sensitivity for IPAQ and RPAQ when evaluating the ability of capturing participants meeting PA recommendations (specificity) or not meeting PA recommendations (sensitivity) ( Table 3). In detail, the overall specificity for IPAQ to capture sufficiently active individuals was 88.7%, whereas only 20.0% were correctly captured as insufficiently active. A similar pattern was found for RPAQ where specificity and sensitivity were 94.2% and 18.8%, respectively. For EPIC-PAQ, results reveal slightly lower specificity (50.2%) but better sensitivity (69.8%). Furthermore, Table 3 provides a more detailed description of the sensitivity and specificity by sex and country, revealing large differences between countries.

DISCUSSION
We present data on the ability of three commonly used PAQs to assess the prevalence of sufficiently active adults compared with objectively measured PA from nine European countries. Our results demonstrate substantial discrepancies in prevalence estimates of being sufficiently physically active derived from the three PAQs. Moreover, all three self-report methods showed low-to-moderate sensitivity to correctly classify inactive people, and the agreement with the objective PA measure was low. Both IPAQ and RPAQ tended to substantially overestimate the number of people meeting the PA recommendations, whereas EPIC-PAQ underestimated the number of people not meeting the PA recommendations, which was illustrated with lower specificity than that found for IPAQ and RPAQ. Results from the present study confirm recent findings that the proportion of individuals being categorized as sufficiently active varies substantially between countries (19). Although the same PAQs and objective measure were used in each country, there are well-known limitations and potential measurement errors in self-reported PA. However, the objective measure used in the present study confirmed differences between countries suggesting that previously reported differences between countries are not entirely explained by differential bias in the PAQs. Thus, it is likely that the present and, to some extent, previously reported findings reflect geographical and cultural differences in overall PA level across European countries.
Nonetheless, our results indicate that the three self-report methods do not match well in ranking population levels of prevalence of individuals meeting PA recommendations. The low-to-moderate correlation observed on country level (Fig. 1A-C) suggests that cross-country comparison is difficult even if the same self-report instrument is used. For example, the IPAQ, which is used for surveillance purposes in the Eurobarometer, performs poorly when ranking countries according to their proportion of individuals meeting PA recommendations. We can only speculate on why countries differ in how methods disagree, but cultural differences in how people understand and interpret with certain PA questions is likely the most obvious reason. However, use of only nine data points need to be considered when interpreting the present results. Nevertheless, our results indicate that observations from cross-country comparisons using IPAQ and RPAQ should be interpreted with some caution, and ideally, objective assessment methods should be used.
Another important question is whether PAQs are able to classify or identify sufficiently active or insufficiently active individuals according to recommendations compared with a criterion method. This information is important when deciding on the best methods available to answer fundamental public health questions such as exploring PA behavior in a population or evaluating the effect of different public health initiatives. The present findings reveal substantial differences in the proportion of individuals classified as meeting the current recommendations for PA between different self-report measures and also when compared with the objective measure (Table 3). These results corroborate to some extent with previous observations from the NHANES where Troiano et al (33) found that less than 5% met the PA recommendations based on accelerometer-derived results, whereas 51% met PA recommendations based on self-report questionnaire data. Others have also found limited ability of IPAQ to classify inactive people when compared with objectively measured PA. Dyrstad et al (14) showed that in a sample of 1751 Norwegian men and women, 67% of participants were categorized as sufficiently active by IPAQ, whereas the corresponding number for accelerometry was 22%. Similarly, Ekelund et al (15) showed that the sensitivity of IPAQ to capture insufficiently active individuals was only 45% in a sample of 187 Swedish adults. Moreover, data from many population-based studies using the IPAQ suggest that approximately three quarters of individuals meet or exceed 150 minIwk j1 of MVPA (11,14,17). To summarize, the available data suggest that the proportion of individuals categorized as sufficiently active based on IPAQ seems substantially overestimated when compared with objective PA measures. The present results also reveal low W-values suggesting limited agreement between the PAQs and the objective measure in defining individuals as physically active according to proposed PA recommendations. For both IPAQ and RPAQ, the sensitivity to identify individuals not meeting PA recommendations was poor, whereas EPIC-PAQ showed somewhat better sensitivity. Thus, suggesting that a simple derived PA index may be superior to the interpretative framework overlaid on IPAQ and RPAQ when used to identify those who are physically inactive.
There are several factors potentially contributing to the observed large discrepancies and poor levels of agreements. First, self-reports are unreliable especially for housework and occupational activity; this may be particularly problematic especially in low-and middle-income countries, where transport, occupational, and housework activities often are mixed with daily life (20). Moreover, social desirability recall bias, and cultural differences in perceptions of the meaning of PA could introduce systematic errors that might lead to overestimations of the respondents_ PA level assessed by self-report (35). Rzewnicki et al (31) have suggested that a possible problem with IPAQ is that the respondents need to report an average time per day for each activity performed, which increases the likelihood that the respondent refers to the most active day. In addition, respondents have to calculate an average amount of time per day across many activities, which might also increase the possibility of overreporting. Participants in this study reported a daily average of 51, 28, and 17 min, respectively, of walking, MPA, and VPA based on IPAQ. Corresponding numbers derived from RPAQ were 98 min in MPA and 14 min in VPA, whereas values derived from the combined HR and movement sensing showed a daily average of 27 min in MPA and 1 min in VPA. These substantial differences in estimated time spent in different intensities from different PA measures underscore that overreporting is a major challenge affecting population prevalence estimates. Although standardized questionnaires (i.e., IPAQ and the Global Physical Activity Questionnaire) have been successfully implemented globally (19), our results suggest that estimates of population levels of PA and differences in these estimates between countries are likely overestimations of actual levels of PA and should be interpreted cautiously.
Second, the specific criteria used to categorize individuals as meeting or not meeting PA recommendations from selfreport were somewhat arbitrary. Thus, it cannot completely be ruled out that this might have biased the results. For example, there is no standard method available for deriving prevalence estimates with respect to PA recommendations based on the RPAQ. Nevertheless, the criteria applied (at least 150 minIwk j1 of activity 93 METs) are in agreement with the proposed guidelines (26,38). Moreover, when defining sufficiently active individuals according to the IPAQ, the original scoring protocol (www.ipaq.ki.se) was slightly modified. All individuals with a total self-reported PA level 9150 MVPA minIwk j1 were considered physically active in agreement with the data derived from the RPAQ. Thus, our criteria for categorizing individuals as sufficiently active are less strict compared with the original IPAQ scoring protocol. On the other hand, summarizing the total amount of activity per week regardless of how the accumulated time is distributed across days is in accordance with the latest recommendations in many countries (26,38). For EPIC-PAQ, we adopted exactly the same criteria for defining the prevalence of sufficiently active individuals (i.e., 9150 MVPA minIwk j1 ). On the other hand, the Cambridge index is based on categorizing individuals into four groups of PA based on occupation and recreational PA. Participants categorized as ''inactive'' or ''moderately inactive'' were classified as not being sufficiently active according to the 150 MVPA minIwk j1 threshold. The ''moderately inactive'' category is defined as reporting a sedentary occupation in combination with G3.5 h of recreational activity, which is higher but as close as possible to the 150 MVPA minIwk j1 threshold. Furthermore, the Cambridge index seems accurate for ranking individuals according to their PA levels (23) and predict increased risk for mortality (24), suggesting both criterion and face validity. Thus, using this simple derived PA index to assign participants into either active or inactive seems reasonable. MVPA was defined as equivalent to 3.5 METs or greater from our combined HR and movement sensing method to closely match the MET values used to define MVPA from IPAQ and RPAQ. For example, walking is defined to have an intensity of 3.3 METs and 3.5 METs in IPAQ and RPAQ, respectively. Furthermore, self-reported time in MVPA is defined as 3.5 METs or greater and 4 METs or greater in RPAQ and IPAQ (www.ipaq.ki.se), respectively. In sensitivity analyses using 3 METs as defining MVPA from our objective criterion method, we observed slightly improved but still poor agreement for IPAQ (W = 0.138 [95% CI, 0.085-0.191]) and RPAQ (W = 0.192 [95% CI, 0.137-0.247]), whereas agreement for EPIC-PAQ, was slightly attenuated (W = 0.114 [95% CI, 0.083-0.145]). Thus, the definition of MVPA from our objective measure did not affect the overall result that agreement for self-report measures are poor in general, although the relative performance of the three instruments depend on the definition of the criterion MVPA measure.
Finally, reference timeframe differs between all three PAQs. The EPIQ-PAQ refers to PA during the last year, the RPAQ to the last 4 wk, whereas the IPAQ refers to the last 7 d. However, the poor agreement with our criterion method, which was equal in magnitude and evident for all three PAQs, was unlikely affected by the differences in recall periods. This is because time spent in MVPA from combined HR and movement sensing was estimated by the average of two time points 4 to 5 months apart.
Objective measures have the potential to overcome limitations associated with self-report, and accelerometers have been suggested as the minimum standard in epidemiological research (10). There is also recent work showing, at least among adults, that objective measured PA are more strongly correlated with several cardiometabolic risk markers (i.e., lipids, triglycerides, insulin, and glucose) compared with self-report (3). This underscores the impression that devices might measure physiological meaningful activity. In line with this, population-based surveys and observational cohort studies using objective assessment methods (i.e., accelerometry) have recently been successfully conducted in several countries (4,12,18,21), suggesting that objective measurement of PA is feasible in large scale cohort studies and PA surveillance systems in developed and developing countries.
Thus, it may be timely to increase the efforts to implement objective measures of PA in large scale surveillance systems. Although great progress has been made in this field, there are still comparability issues because of the variety of monitors and differences in study protocols, data cleaning, and data reduction procedures used. One possible solution to overcome comparability issues across brands of accelerometers might be the use of raw acceleration data rather than relying on proprietary activity counts. For example, raw accelerometer output from two different accelerometer brands seems comparable when attached to the same body location (22), suggesting that the output from different brands are comparable when expressed in SI units (i.e., milligrams).
Nevertheless, it should be acknowledged that measures derived from self-report and objective methods are not equivalent. As recently pointed out in a review by Troiano et al (34), summary measures are often expressed using the same metrics (i.e., PAEE, time spent in different intensities or METIminIwk j1 ). However, combined HR and movement sensing quantify the acceleration of the trunk in combination with individually calibrated HR to estimate PAEE over a short period, whereas self-report instruments attempt to quantify PA based on reported time engaged in specific behaviors. Thus, these two methods are in fact measuring different aspects of the concept of physical activity and thereby leading to challenges for direct comparison. Troiano et al (34) also argue that the epidemiological studies that are the basis for the PA recommendations rely on self-report and thereby questioning estimations of proportion sufficiently physically active based on objective measures. On the other hand, few questionnaires are designed to estimate the prevalence of PA according to the recommendations. Moreover, the fact that we still are comparing results based on different measures highlights the importance of documenting measurement errors between self-report and objective measures.
Although measured at two visits 5 months apart, a limitation of our objective method is that it provides only a snapshot (at least 1 d) of PA, and daily variability might have been better captured with a full week of monitoring. Moreover, an accelerometer located on the trunk is likely to underestimate certain activities such as cycling, swimming, or upper body movement, and single HR monitoring is a less valid measure of energy expenditure during sedentary and light activity (23). However, the combination of HR and movement sensing has the potential to circumvent some of the limitations of the two respective methods. We also acknowledge that our individual calibration could be limited by only using a step test calibration, and a treadmill would have been better. In addition, our estimation of time spent in MVPA included an accumulation of all minutes spent above the 3.5-MET threshold and did not consider, for example, continuous 10-min bouts, which is part of the current PA guidelines for public health (9,26,38). Thus, our estimate of time spent in MVPA likely overestimated the proportion of participants meeting PA recommendation from the objective criterion measure; however, such an overestimation would only imply an even larger discrepancy between objectively measured and self-reported PA.
Among the strengths of the present study is the large and diverse sample of men and women from nine different European countries in which data collection procedures and methods were standardized across study locations and PA assessed by combined HR and movement sensing at two different time points 5 months apart.

CONCLUSIONS
Our results reveal substantial differences in prevalence estimates between self-reported measures when assessing compliance with PA recommendations compared with an objective criterion measure. The three self-reports do not perform well in ranking country levels of the proportion of individuals meeting PA recommendations. Furthermore, all three self-report methods (IPAQ, RPAQ, and EPIC-PAQ) demonstrated low-to-moderate sensitivity to correctly classify inactive people, and the agreement between PA measures was low, suggesting weak relationships between PAQs and the criterion method. Nevertheless, self-report is vital for measuring attitudes, perception of environment, activity types, and context. Thus, implementation of a combination of objective and subjective assessment methods in large scale cohort studies and surveillance systems should be a priority in future PA research.