Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore eta-group-report

eta-group-report

Published by pamelakelly7, 2018-03-20 09:34:15

Description: eta-group-report

Search

Read the Text Version

Research Practicum Project Report Dublin Bus Travel Time PredictionAndrew Cameron, Pamela Kelly, Conan Martin & Fangxue Mei A thesis submitted in part fulfilment of the degree of MSc. in Computer Science (Conversion) Group Number: 7 COMP 47360 UCD School of Computer Science University College Dublin February 8, 2018

AbstractThis report details the development of a software system which provides accurate estimates oftravel time on the Dublin Bus service based on historical data from 2012/2013. The report willcover the process whereby the analysis of historical data has been used to create a predictionsystem, using standard data analytics practices, as well as the development of the web applicationwhich services the end user with the resulting predictions.The findings of the project have been that a prediction system utilising a linear regression modelcan be successfully created that delivers a prediction with an absolute mean error of 7.2 minutes.The solution presented in this paper considers variables such as weather, congestion and timeand uses them to predict more dynamic travel times than currently available systems provide. Inaddition the solution endeavors to create a system where alternative routes are suggested for theuser, using a combination of data structures and a K nearest neighbor search algorithm. Page 1 of 10

AcknowledgmentsWe would like to thank the following people for their advice and guidance throughout this project:Gavin McArdle, Vivek Nallur, Hamed Jahromi and Ellen Rushe. Page 2 of 10

Table of Contents1 Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 2: Main Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 System Architecture and Deployment . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Web Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.6 Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Progress Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 3 of 10

Chapter 1: Chapter 1: IntroductionThe core purpose of this project is the development of a dynamic travel time prediction sys-tem. The system builds on currently available services by considering additional variables such asweather, time, day and congestion. This is achieved through an in depth data analysis process,wherein the team explored the various raw features in the data, additional data sourced elsewhere(e.g. Met Eireann, National Transport Authority) and engineered features to represent some ofthe key domain concepts identified. The subsequent development of a predictive model whichsuits the needs of the project allows the system to provide its users with a dynamic travel timeprediction based on this historical data.The second key aspect of the system is the development of a responsive web application whichinteracts with this trained predictive model and serves the resulting predictions to the end user.User experience is a key priority. In order to satisfy this the team has developed a performantapplication with a clean interface optimised for mobile devices. This report details the develop-ment of the above detailed system at a high level. The report will outline the necessary softwarerequirements, system architecture and development stack. It will also describe the key develop-ment processes involved in the project such as the data analytics, back end development, frontend development and deployment. Finally the report will detail the team’s progress so far andinclude a detailed time line for work still to be completed. Page 4 of 10

Chapter 2: Chapter 2: Main Body2.1 Problem DefinitionWhen considering the target user for the system, the team decided to take a consumer focusedapproach to the problem. Room in the market for dynamic travel time prediction gives justificationfor this choice. Building on this target user the team identified a number of use cases: • Commuter: a user that travels via public transport on a daily basis. Accuracy and reliability are key for this user. • Tourist: a user that is unfamiliar with the public transport system of Dublin and needs to plan excursions ahead of time. • Causal Passengers: a user that is an infrequent passenger, but might plan a trip ahead of time.With this in mind, the team defined the below requirements for the system.2.2 Software RequirementsThe system should provide a web-based interface for users. This interface should be mobileoptimised. In addition, the system should utilize an optimized predictive model for calculatingthe travel time estimates. This model should be produced using relevant data analytics methodsand evaluated for performance and accuracy. The final application should serve these predictionsto the user based on their route preferences.2.3 System Architecture and DeploymentIn order to implement the above outlined system, the team has chosen to use a development stackthat is both familiar and appropriate. This stack consists of HTML, CSS, and JavaScript for thefront end; the flask web framework for the back end; Python, Pandas and Jupyter Notebookfor data analytics; Docker and Nginx for deployment. All static data such as timetables, stopinformation and route information is stored in a MySQL DB, hosted on the cloud service AmazonRDS. The pickled model will be loaded into RAM when the application is run.Updated versions of the application are deployed simply with the use of git, and thanks to theuse of Docker containers, deployment problems are kept to a minimum. The application will behosted on a remote virtual compute instance, provided by UCD. Page 5 of 10

We considered alternative stacks including Django, however we opted for the flexibility that Flaskprovides. Supplementing missing functionality with additional libraries such as Flask cross originand SQLAlchemy for object mapping.2.4 Web Application2.4.1 User Interface: Front End DevelopmentBootstrap has been utilised to aid in the design of the site, and to provide a responsive, mobile-friendly experience.Utilising the Google Maps API, a view of Dublin City is presented, with pins placed for a route’sstops when appropriate. Currently users are presented with a dynamic form which allows themto refine their route preferences based on origin, destination, day and time. The form updatesdynamically, only providing linked stops and available routes for specified origin and destination.Once the user’s preferences are submitted, they are presented with information on the weatherforecast for their selected time and date, as well as a predicted travel time between the twolocations. The user will also be provided with any additional information generated by innovativefeatures developed such as alternative routes and data visualisation of journey trends. Page 6 of 10

2.5 Data Analytics2.5.1 Sourcing DataIn order to conduct effective data analysis a project needs substantial and appropriate data. Twomonths of historical GPS data were provided. This data details the movements of Dublin Busvehicles over the course of November 2012 and January 2013. In addition to this data, we sourcedhistorical weather data from Met Eireann. It was also necessary to source data for the applicationsuch as stop locations, from the National Transport Authority and timetable data, by scrapingweb.archive.org.In the initial stages of data exploration using 2012 data enabled the team to maintain a broaderrange of options for descriptive features. Features such as journey patterns and stop ids werespecific to 2012 and much of this information had changed drastically in the 5 years since. Sincethen, the model has been generalized so that the descriptive features do not contain any dataspecific to 2012 - except perhaps congestion. This makes it more possible to adapt the systemfor 2017. The team decided to use 2012 data as a verifiable proof of concept.2.5.2 Data Analysis and ExplorationThe project followed the CRISP-DM process, which enabled the team to work through the dataanalysis iteratively. Initially discussion of the business understanding of the project led to brain-storming domain concepts and possible features. Statistical analysis and visualisation helped toimprove understanding of the data and determine iteratively the most significant descriptive fea-tures and the appropriate target feature. This system was streamlined through the use of a dataquality report template. This process allowed the team to identify critical data quality issues inits derived features and rectify them. This process of data preparation has ensured a body of datawith minimal quality issues, and was guided by the process outlined by Kellenher et al. [2]The target feature decided on was travel time from the first stop on the journey pattern to eachsubsequent stop. Using this feature we can derive the time between any two stops on a specificjourney pattern. This is utilised by querying the model for both the origin and destination specifiedby the user, and subtracting the difference to calculate the travel time for their journey. This alsoprovides the estimated travel time to the origin stop which can be used with the timetables forestimated arrival time at origin stop.2.5.3 ModelingRegression models were selected as appropriate for the continuous nature of the target feature.The models tested were multiple linear regression, support vector regression, neural network modeland random forest regressor. In order to evaluate the various models, evaluation metrics suchas plots, mean absolute error and r-squared were used. This selection was narrowed down tomultiple linear regression and random forest as those models produced the best results. Linearregression provided a good balance between accuracy and performance. While also enabling thegeneralisation of the modeling process by removing the use of large categorical features such asjourney pattern id. This linear model was designed to apply to the entire data set, excluding theuse of journey pattern id, but improving the system’s scalability and portability. The modelingprocess was informed by research of the appropriate literature. [3] Page 7 of 10

2.6 Innovation2.6.1 Route SuggestionsIn an attempt to provide a more complete user experience, a solution to find the best route basedon an arbitrary origin and destination is being explored. This would allow the system to proposejourneys spanning multiple bus lines and enable the user to select their origin and destinationwithout any prior knowledge of current bus routes or bus stop locations.A combination of a bi-directional unweighted multigraph, a two-dimensional tree and hash tablescurrently establish an early prototype of this feature. These data structures were chosen in orderto optimise performance.2.6.2 Real Time DataIn order to provide a comprehensive user experience, the system should provide real time data forDublin Bus stops as well as estimated travel time. In order to do this, the system will query theDublin Bus API and display the real time data for the stop selected by the user. Page 8 of 10

Chapter 3: Progress ReportThe most recent model performs with a mean absolute error of 7.2 minutes. This model is in theprocess of being deployed and connected to the front end. The user query is currently parsed toprovide data needed for the baseline model. This is being updated to provide the features neededfor the newer model, before deployment.The current working prototype enables the user to select their desired origin and dynamicallyproduces a list of connected destination stops. As well as a list of relevant routes based ondestination. The maps functionality is implemented, producing pins of the origin and destinationafter the user query is submitted. At the moment the system only provides travel times for onejourney pattern - the number 00010001. In order to scale this to the full range of journeys, dataneeded by model for prediction such as distances from terminus for all routes are currently beingcalculated. Once this data is calculated and stored in the database, the system will be able toprovide travel times for most journey patterns.The system currently suggests direct bus lines between the stops specified by the user. Thefunctionality to suggest routes requiring changing buses mid-route has been demonstrated, how-ever needs to be integrated with the deployed version of the application, and made to work inconjunction with the predicted bus times. Page 9 of 10

Bibliography [1] Matthas Kormksson, Luciano Barbosa, Marcos R. Vieira, and Bianca Zadrozny. Bus Travel Time Predictions Using Additive Models. 6 pages. IEEE International Conference on Data Mining2014. [2] John D. Kelleher, Brian MacNamee, Aoife D’Arcy Fundamentals of Machine Learning for Predictive Analytics [3] Peter Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data [4] M. Yang, C. Chen, L. Wang, X. Yan, L. Zhou Bus Arrival Time Prediction Using Support Vector Machine With Genetic Algorithm Page 10 of 10


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook