A Review of Automatic Travel Mode Detection Methods

Household trip data is of crucial importance for managing present transportation infrastructure as well as to plan and design future facilities. It also provides basis for designing new policies, implemented under Transportation Demand Management, and assessing their effectiveness. With passage of time, methods used for household trip data collection have evolved, starting from the conventional face-to-face interviews or paper-and-pencil interviews, moving on to mail-back surveys and internet-based surveys, before finally reaching to the recent approach of passive data gathering. Recording travel data automatically will require the use of modern technology present in the form of various sensors, and employing intelligent algorithms to infer the required information from these sensors’ data. These sensors can be integrated into a purpose-built device or more recently can be present in smartphones. The current study provides a comprehensive review of the research done in the field of travel mode detection from data passively collected with the help of various devices. The review starts from Global Positioning System (GPS) loggers and moves to cover purpose-built wearable devices containing additional sensors and finally ending with the most modern approach of incorporating smartphones. The summary tables presented in this study are of great value to the researchers trying to get insight of this research field.


INTRODUCTION
ccurate household travel data collection is indispensable for the efficient designing and management of transportation infrastructure. Introduction of new policies as well as modification or discontinuation of implemented policies, require some ground data. This ground data should be logical, appropriate, abundant and reliable, hence requiring a cost-effective, laborsaving and well-organized collection method. The aim is to develop such a methodology where data can be recorded without putting undue pressure on the respondent, but at the same time, without compromising the accuracy and reliability of the collected data. Usually the collected data sheds light on the trip origin and destination population, usually by mail. The respondents were requested to record their answers on the questionnaire and return them. Initially, the questionnaire targeted only one-day trip information but gradually it incorporated multi-day travel data in the form of multi-day questionnaires and travel diaries. Around the same time when paper-based surveys were starting to replace the face-to-face interviews, telephone surveys were also introduced [2]. The major difference between face-to-face interviews and telephone interviews was that the interviewer was not required to meet the respondent in person. Rather, the interviewer could ask the questions over the telephone. A modification was made by merging paper-based surveys and telephone interviews, where the paper questionnaire was mailed to the respondents mailing address and later the responses were collected via telephone interview.
Although paper-based surveys and telephone surveys were less costly than face-to-face interviews, yet the cost was still high and this high cost played a major role in limiting the sample size. To cope with this problem, computer-assisted surveys were introduced in the 1980s. These surveys collected diary-type survey data in an electronic format. The methodologies employed were Computer-Assisted Telephone Interviews (CATI), Computer-Assisted Personal Interviews (CAPI) and Computer-Assisted Self-Interviews (CASI) [3]. CATI was typically applied on PCs; CAPI on PCs, laptops and handheld devices; and CASI through the internet [4]. The computer-assisted surveys demonstrated a refinement from the previous face-to-face interviews [5], but failed to adequately address the basic limitations in (Person Trip (PT) data collection methods [6][7][8].
All these methods have one core drawback and that is the reliance on the memory of the respondent. Rather than using the exact time stamps, this dependence results in approximation of the starting and ending times, as well as overlooking of small trips. Furthermore, the perception of time varies with the mode of transportation used. For example, a person travelling by car will underestimate the travel time, whereas the same person travelling via public transport on the same route will overestimate it [9,10]. Personal preferences also tend to induce biasedness in the responses. All this results in decreased accuracy of the data collected. To minimize the impact of personal errors, a new approach has been introduced where travel data is automatically recorded by devices either placed at fixed locations or carried by the respondents themselves [8]. These devices can be equipped with one or multiple sensors like GPS, accelerometer, gyroscope, magnetometer, compass, barometer etc.
The present study provides a review of the research done in the field of automatic detection of transport mode. The paper is sequenced in the way various technologies were introduced, starting with GPS loggers, then moving to other wearable sensor devices and finally shedding some light on the use of smartphones. Mostly, open-access papers or papers from journals subscribed by University of Tokyo were included in this study.

GPS LOGGERS
Initial studies only utilized GPS devices installed in vehicles, hence it was impractical to monitor other modes of transportation like walk or bicycle [11]. With the progress in technology, portable GPS devices were introduced, which were capable of recording the nondriving trips as well. Consequently, various modes could be covered. This led to significant research for the development of algorithms that can infer trip details like travel mode and trip purpose [12][13][14]. Studies using wearable devices used sample sizes ranging from 100 to 2000 participants [15][16]. In some cases, the users were asked to respond to a prompted recall survey [17][18][19]. In these prompted recall surveys, GPS data was processed and then presented back to the participants, usually in the form of detailed maps, for confirmation and/or correction of the inferred trip details. Additional information could also be collected at the same time [20].
GPS logger was used in a study to collect nearly 60 hours of GPS data, which was sampled at varying frequencies to simulate data collection by mobile phones [21]. By applying Neural Networks (NN), the classification results were acquired, which indicated that higher sampling frequency and longer monitoring duration results in better mode detection accuracy. Another study investigated the possibility of using GPS-based method for multi-day travel surveys [22].
The proposed system consisted of two main processes: an interpretation and a validation process. During the interpretation process, a spatial Database Management System (DBMS) was used to combine data inputs from three different sources: GPS logs from the participants carrying the GPS devices, personal characteristics of the participants collected by a survey and Geographic Information System (GIS) data. After interpretation of travel characteristics, the results were provided to the participants in the form of maps and tables for the validation process. After the results were validated and/or modified, interpretation was again performed in light of the new information provided.
A GPS/GIS technique for travel mode detection was experimented in NYC (New York City) [23]. Two surveys were conducted at different locations and times in NYC. The developed algorithm used the GPS data in conjunction with the GIS information to distinguish among walk, car, bus, subway and commuter rail. In an extension of this study the problem of warm start, often experienced with GPS loggers, was addressed by keeping the GPS devices operational all day long [24]. To identify the trip endings, a set radius was incorporated in addition to the dwell time criterion. The developed methodology was reported to be more accurate than the previous approach.
A study targeted the areas usually neglected by other researchers like sample size, duration, granularity, selection of variables and method of inference [25]. The appropriate number of participants was calculated to be 81 and hence the same number was used.
Duration of data collection was selected to be 2 weeks in order to account for the weekly variation. Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel was employed to classify the data consisting of two features: speed and acceleration. Another study employed 12 participants, equipped with GPS loggers, to gather GPS locations and speed while making two commutes back and forth to work at the Institute for Risk Assessment Sciences, Utrecht, Netherlands [26]. The speed metrics evaluated included: mean, 95 th percentile of speed, standard deviation of the mean, rate of change, standardized rate of change, acceleration and deceleration. The Cohen's Kappa value came out to be 0.73 when classifying the modes as walking, bicycling, car, bus and train but it increased to 0.95 when all motorized transport was combined into one category. Table 1 provides a summary of the studies that used GPS loggers for mode detection. It is evident that GPS surveys provide quite accurate data, however they have some limitations. Most noteworthy is signal loss due to cold/warm start and urban canyons. When a GPS device is turned on at the beginning of the day (cold start) or switches from "sleep mode" to "active mode" (warm start) after a short stoppage during the day, some time period is required to connect the device with the GPS satellites. During this connecting time, the data recorded is erroneous. The urban canyon effect is created by tall buildings surrounding the GPS device. The GPS reception is affected and signal loss occurs. This causes whole trips or parts of trips to be missed, as well as registering of spurious trips.

PURPOSE-BUILT DEVICES
Numerous studies required the participants to record the trip information into Personal Digital Assistants (PDAs) or manual travel diaries, besides carrying around the GPS devices [5,27,28]. This proved to be tedious for the participants, leading the research into passive devices. These devices were easy to use, as they required no manual data input. The data recorded from the sensors embedded in such devices was processed to acquire the required information. In addition to GPS, other sensors were also experimented in combination including accelerometer [29]. A summary of the studies that used purpose-built wearable devices for travel mode detection is given in Table 2.
In a study activity recognition was explored by placing multiple sensors at different positions on the body [30]. The sensors included dual axes accelerometer, microphone, temperature and light sensor. These four sensors were housed in a device called eWatch, based on Philips LPC2106 ARM7 TDMI microcontroller. Each participant wore six eWatch devices, placing them on the left wrist, belt, necklace, right trouser pocket, shirt pocket and bag.
A study documented the development of a small wearable device called Mobile Sensing Platform (MSP) for activity recognition [35]. Multiple sensors were integrated into the device including electret microphone, visible light phototransistor, 3-axis digital accelerometer, digital barometer/temperature, digital IR and visible+IR light, digital humidity/temperature and digital compass. The device took four years to build, during which the device went through modifications following lessons learned from several deployments. The study reported three crucial capabilities required for mobile inference systems; the device should be small yet able to recognize a broad range of activities, the system hardware should have enough storage, computational capacity and battery power for at least an entire day, and the recognition algorithm should be efficient.
A comparison between the various pre-processing techniques used in several studies was carried out in [32]. In addition to travel mode prediction accuracy, computational costs and storage requirements of the features were also taken into account. Observed activities included, walking, running and jumping. Almost 50% of the data was used to train the algorithm. The results suggested that for the threeactivity scenario, the best frequency-domain techniques yielded comparable results to the best timedomain techniques. However, for the two-activity scenario, the best time-domain techniques prevailed. Identification of behavioral context based on multiple sensors' information was demonstrated in [8]. A purpose-built device named Behavioral Context Addressable Loggers in the Shell (BCALs) was introduced containing numerous sensors including accelerometer, barometer, microphone etc. The data from these sensors was analyzed to identify the travel mode, change of floor and type of place. Accelerometer data collected from three cities of Japan using BCALs was later included in another study [29]. The raw accelerometer data was preprocessed and features were extracted. The study also provided a comparison among four classification algorithms namely SVM, AdaBoost, decision tree and random forest. Accuracy achieved while identifying four different modes of transportation was remarkable. While employing S and accelerometer for data collection, three approaches were compared: GPS data only, accelerometer data only and GPS combined with accelerometer data [33]. The study used the Bayesian belief network model for classification purposes. Results showed that the acceleration only approach, with a mean validation accuracy of 88.87%, works better than GPS only (mean 78.4%), but the combined data approach outperforms both of them, with a mean validation accuracy of 91.7%. Personal Activity Location Measurement System (PALMs) was employed to process the GPS data in order to screen out invalid coordinates [34]. A simple moving average filter was applied to the predicted class labels, where the previous 2-minute predictions and following 2minute predictions were analyzed and the mode having the highest number of predictions was considered the output.

SMARTPHONES
Nowadays, smartphone penetration is on the rise, even in developing countries, making them indispensable. In addition to their rapid dispersion, the other thing that caught the attention of researchers was the integration of a vast array of sensors, most significant being GPS and accelerometer. For smartphone users, these devices are an essential part of their lives, hence scoring a major advantage over other purpose-built devices integrated with similar sensors, as the respondents have to perform the cumbersome task of carrying them everywhere. With the help of the GPS sensor, smartphones are capable of collecting similar data as was collected by handheld GPS devices. Accelerometer is another very useful sensor added to the smartphones. It can detect acceleration with respect to gravity, along the three coordinate axes. A summary of the research done by incorporating smartphones is provided in Table 3. The studies have used smartphones to conduct GPS surveys [36][37][38]. Common to all studies is the extraction of suitable features from the raw sensor data, formation of training dataset to train a classification algorithm and prediction of test data by feeding it to the trained algorithm. Regarding the classification algorithms, a number of studies have compared various classification algorithms in order to determine the most appropriate e.g. SVM, Neural Nets, Logistic Regression, Naïve Bayes, Memorybased learning, Random forest, Decision trees, Bagged trees, Boosted trees and Boosted stumps [39], Naïve Bayes and SVMs [40], Decision trees, K-means clustering, Naïve Bayes, Nearest neighbor, SVMs, Continuous Hidden Markov model and Decision trees [41], and Naïve Bayes, Bayesian network, Decision trees, Random forest and Multi-Layer Perceptron (MLP) [42].
In a study, a critical point algorithm was developed, which was capable of eliminating the unnecessary GPS points [43]. The GPS data was not collected at a fixed interval; rather the recording was done only at strategic locations. The strategic locations included places where a straight-line path started or ended. Another study employed four participants to specifically use iPhones for collection of accelerometer data [44]. SVM was applied using the extracted features to classify the travel mode into walking, running, biking or driving. Similarly, a study utilized simple features extracted from accelerometer to distinguish among sitting, standing, walking, running, bicycling and driving [45]. During data processing, the effect of jittering noise was reduced by scaling down and rounding the acceleration values, followed by a smoothening technique using a moving average filter. A comparison was made among decision tree, SVM, Naïve Bayes and K-Nearest Neighbor (KNNs), with results indicating decision tree to be the better option. The study concluded that decision tree could provide a usable model for inference of the physical activity diary, refined by similarity match from k-means clustering results and smoothened by Hidden Markov Model (HMM).
A comparison among various classifiers was provided in [46] and a unique approach of combining decision tree with Discrete Hidden Markov Model (DHMM) was introduced. The DHMM was trained by the class posterior probability provided by the decision tree classifier. The study was further extended by increasing the amount of collected data [41]. Correlation Based Feature Selection (CFS) was applied to select the suitable features extracted from GPS and accelerometer data. As expected, the combination of decision tree and DHMM proved to be the best choice, although the overall accuracy achieved was lower than the one achieved in the previous study. The reason might be the huge amount of data and the presence of variability among the increased number of participants. A similar approach was adopted in [47], by extracting a huge number of features and applying decision tree followed by DHMM. For smoothening the GPS trajectory, Kalman filter was utilized and a highpass Butterworth filter was used to remove low frequencies present in the accelerometer data.
Accelerometer data alone was used to compare between Naïve Bayes and SVM for better classification among three modes [40]. Another study utilized the accelerometer data to distinguish among sitting, standing, walking, jogging, moving upstairs and moving downstairs [48]. The results reported that in overall performance, multilayer perceptron outclassed other algorithms but none of the tested classifiers, including multilayer perceptron, was able to identify all the activities with good accuracies. In addition to traditional features, new features of average bus closeness, average rail closeness and average candidate bus closeness were introduced in [42]. GPS data collected by smartphones was used in combination with the transportation network data to extract classification features. These features were fed to a classification algorithm for learning and subsequent testing. The transportation network data covered the city of Chicago, Illinois, USA and consisted of real time location of buses, spatial data of rail lines and location data of bus stops. The feature set extracted contained (1) average accuracy of GPS coordinates, (2) Average Speed, (3) Average heading change, (4) Average acceleration, (5) Bus location closeness, (6) Rail line trajectory closeness, and (7) Bus stop closeness rate. Moreover, a Zip code based indexing and pruning method was utilized where each GPS entry was compared only with the transportation network data in the same zip code, and not with the entire data. GPS data was collected by 6 participants over a period of 3 weeks using 3 types of mobile devices, (1) HP IPAQ PDA, (2) Samsung Galaxy and (3) IPhone 3G. In order to save battery power, GPS sensor reports were submitted every 15 seconds. A window size of 30 seconds was chosen for feature selection. A comparison was made among five classifiers: Bayesian Net, Decision Tree, Random Forest, Naïve Bayes and MLP. Random Forest was reported to outperform all others.
Preliminary results of an ongoing project for automatically reconstructing trips in travel survey by utilizing the smartphones, were provided in [49]. The data was collected in the city of Vienna, Austria. The instances where GPS data was missing due to signal loss, the features extracted from accelerometer were used alone. A study concluded that random forest is remarkably accurate in classifying travel modes and therefore does not require any post-processing technique such as Viterbi algorithm [50]. Random forest was also reported to be quite efficient with only few minutes of training time required, as opposed to minimum three hours taken by SVM and NN. An accelerometer-based mode detection system applied in a hierarchical manner, was proposed in [51]. At the start, kinematic motion classifier was used to distinguish between walking and other modes, followed by stationary classifier to determine whether a user is stationary or is using a motorized transport. Lastly, the motorized classifier was applied to classify the activity into one of five modes: car, bus, train, metro or tram.
A study utilized the GPS and accelerometer data collected by 18 smartphone users to classify the transportation modes into stationary, walking, bicycling and motorized transport [52]. The stationary mode was further classified as stay (remaining in the same place for long time) and wait (remaining in the same place for short time). Another study utilized intuitive logic to detect transportation context from barometer readings collected by smartphones [53]. 13 participants from 3 different countries collected the data to test the algorithm against the Google's accelerometer-based Activity Recognition algorithm and Future Urban Mobility Survey's (FMS) GPSaccelerometer server-based application. The developed approach reported to be more energy-efficient with comparable accuracy to both Google and FMS. The accuracy was further improved by fusing barometer and accelerometer data. In a study, the walking activity was initially identified from analyzing the accelerometer data, which in turn acted as a separator and assisted in partitioning the data into other activity segments [54]. The mean acceleration value of walking was reported to be about 27 times higher than the mean accelerations of other activities, making it very easy to identify. The segments with vehicular transport were classified as car, bus or train/tram by setting separating acceleration values. Another study investigated the working of different algorithms, especially SVM in order to come up with a low power classifier [55]. Gyroscope was reported to be responsible for 85% of the total power consumption, therefore to decrease the power uptake; the concept of virtual gyroscope was introduced. Instead of collecting the data from gyroscope, the data was simulated from that of accelerometer and magnetometer combined. To filter out the short-term noise, a simple but effective voting scheme was implemented. The effect of decreasing data collection frequency was investigated in [56]. Sensors' data collected by smartphones was processed to extract nine features for classification purposes. Random Forest was used for the task, which resulted in high detection accuracy. With decrease in data collection frequency, from original 10-0.2 Hz, the detection accuracy plummeted. However, the computation cost also dwindled drastically. It was concluded that a balance has to be maintained between the accuracy required and respective cost to be incurred.

CONCLUSION
The recent approach of passive data collection and inference of travel mode has been the focus of many researchers. The successful and widespread implementation of such a methodology will be highly valuable to the transportation community, as very detailed data will become available, exponentially improving the understanding of the travel behavior of people. This will be immensely helpful in policy implementation as well as infrastructure management and design. Such a technology will not only have applications in the transportation sector but will also expand to fields like healthcare, mapping, marketing etc.
The present study aims to provide a comprehensive review of the work done in the field of automatic travel mode detection, but to cover all of the research is impossible as it is a hot research area and new studies are continuously entering the body of knowledge. Nevertheless, more than enough information is provided in this study to let anyone be acquainted with this field. Identification of trip purpose is another objective of passive data collection but is purposely not discussed in this study, as it is a vast topic in itself. This subject will be covered in future study.