research-article

The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms

Authors:
Yongxin Tong

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

,
Yuqiang Chen

4Paradigm Inc., Beijing, China

4Paradigm Inc., Beijing, China
View Profile

,
Zimu Zhou

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Lei Chen

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Jie Wang

Didi Research Institute, Beijing, China

Didi Research Institute, Beijing, China
View Profile

,
Qiang Yang

4Paradigm Inc. & Hong Kong University of Science and Technology, Hong Kong, China

4Paradigm Inc. & Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Jieping Ye

Didi Research Institute, Beijing, China

Didi Research Institute, Beijing, China
View Profile

,
Weifeng Lv

Beihang University, Beijing, China

Beihang University, Beijing, China
View Profile

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2017Pages 1653–1662https://doi.org/10.1145/3097983.3098018

Published:13 August 2017Publication History

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1653–1662

ABSTRACT

Taxi-calling apps are gaining increasing popularity for their efficiency in dispatching idle taxis to passengers in need. To precisely balance the supply and the demand of taxis, online taxicab platforms need to predict the Unit Original Taxi Demand (UOTD), which refers to the number of taxi-calling requirements submitted per unit time (e.g., every hour) and per unit region (e.g., each POI). Predicting UOTD is non-trivial for large-scale industrial online taxicab platforms because both accuracy and flexibility are essential. Complex non-linear models such as GBRT and deep learning are generally accurate, yet require labor-intensive model redesign after scenario changes (e.g., extra constraints due to new regulations). To accurately predict UOTD while remaining flexible to scenario changes, we propose LinUOTD, a unified linear regression model with more than 200 million dimensions of features. The simple model structure eliminates the need of repeated model redesign, while the high-dimensional features contribute to accurate UOTD prediction. We further design a series of optimization techniques for efficient model training and updating. Evaluations on two large-scale datasets from an industrial online taxicab platform verify that LinUOTD outperforms popular non-linear models in accuracy. We envision our experiences to adopt simple linear models with high-dimensional features in UOTD prediction as a pilot study and can shed insights upon other industrial large-scale spatio-temporal prediction problems.

Supplemental Material

ye_online_platforms.mp4

mp4

402.3 MB

Download

References

Didi Chuxing. https://en.wikipedia.org/wiki/Didi_Chuxing.Google Scholar
Grab. https://en.wikipedia.org/wiki/Grab_(application).Google Scholar
Uber. https://en.wikipedia.org/wiki/Uber_(company).Google Scholar
A Anwar, M. Volkov, and D. Rus. 2013. Changinow: A mobile application for efficient taxi allocation at airports. In 16th International IEEE Conference on Intelligent Transportation Systems, The Hague, The Netherlands. 694--701. Google ScholarCross Ref
T. Cover and J. Thomas. 2006. Elements of information theory (2. ed.).Google Scholar
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, and Q. Le. 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1232--1240.Google Scholar
J. Friedman. 2001. Greedy function approximation: a gradien boosting machine. Annals of statistics(2001).Google Scholar
X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah R. Herbrich, and S. Bowers. 2014. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, New York City, New York, USA. 5:1--5:9. Google ScholarDigital Library
G. Hinton and R. Salakhutdinov. 2006. Reducing the dimensionalit of data with neural networks. Science(2006), 504--507. Google ScholarCross Ref
Q. Ho, J. Cipar, H. Cui, S. Lee, J. Kim, P. Gibbons, G. Gibson, G. Ganger, and E. Xing. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems. 1223--1231.Google Scholar
M. Li, D. Andersen, J. Park, A. Smola, A. Ahmed, V. Josifovski, J. Long, E. Shekita, and B. Su. 2014. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation, Broomfield, CO, USA. 583--598. Google ScholarDigital Library
Y. Li, Y. Zheng, Y. Zhang, and L. Chen. 2015. Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA. 33:1--33:10. Google ScholarDigital Library
B. McMahan, G Holt, D Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. Hrafnkelsson, T. Boulos, and J. Kubica. 2013. Ad click prediction: a view from the trenches. In The 19th ACM International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA. 1222--1230. Google ScholarDigital Library
L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas. 2013. Predicting taxi-passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems (2013), 1393--1402. Google ScholarDigital Library
N. Mukai and N. Yoden. 2012. Taxi demand forecasting based on taxi probe data by neural network. In Intelligent Interactive Multimedia: Systems and Services. 589--597. Google ScholarCross Ref
J. Sun, D. Papadias, Y. Tao, and B. Liu. 2004. Querying about the past, the present, and the future in spatio-temporal. In Proceedings of the 20th International Conference on Data Engineering, Boston, MA, USA. 202--213. Google ScholarCross Ref
Y. Tong, Y. Chen, Z. Zhou, L. Chen, J. Wang, Q. Yang, and J. Ye. 2017. The simpler the better: a unified approach to predicting original taxi demands on large-scale online platforms (Techinical report). http://www.cse.ust.hk/~yxtong/tr_prediction.pdf. (2017).Google Scholar
Y. Tong, J. She, B. Ding, L. Chen, T. Wo, and K. Xu. 2016. Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis. In Proceedings of the 42nd International Conference on Very Large Databases, New Delhi, India. 109--118. Google ScholarDigital Library
Y. Tong, J. She, B. Ding, L. Wang, and L. Chen. 2016. Online mobile Micro-Task Allocation in spatial crowdsourcing. In 32nd IEEE International Conference on Data Engineering, Helsinki, Finland. 49--60. Google ScholarCross Ref
V. Vapnik and V. Vapnik. 1998. Statistical learning theory. Wiley New York.Google Scholar
Li X., Pan G., Wu Z., Qi G., S. Li, D Zhang, W. Zhang, and Z. Wang. 2012. Prediction of urban human mobility using large-scale taxi traces and its applications. Frontiers of Computer Science in China (2012), 111--121.Google Scholar
L. Xiao. 2010. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research (2010), 2543--2596.Google Scholar
J. Yuan, Y. Zheng, X. Xie, and G. Sun. 2011. Driving with knowledge from the physical world. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA. 316--324. Google ScholarDigital Library
J. Yuan, Y. Zheng, C. Zhang, W. Xie, G. Xie, X.and Sun, and Y. Huang. 2010. T-drive: driving directions based on taxi trajectories. In 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, San Jose, CA, USA, Proceedings. 99--108.Google Scholar
J. Yuan, Y. Zheng, L. Zhang, X. Xie, and G. Sun. Where to find my next passenger. In Proceedings of the 13th international conference on Ubiquitous computing, Beijing, China. 109--118. Google ScholarDigital Library
M. Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint (2012).Google Scholar
K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang. 2016. A Framework for Passengers Demand Prediction and Recommendation. In IEEE International Conference on Services Computing, San Francisco, CA, USA. 340--347. Google ScholarCross Ref
K. Zhao, D. Khryashchev, J. Freire, C. Silva, and H. Vo. 2016. Predicting taxi demand at high spatial resolution: approaching the limit of predictability. In 2016 IEEE International Conference on Big Data, Washington DC, USA. 833--842. Google ScholarCross Ref
Y. Zheng, L. Capra, O. Wolfson, and H. Yang. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (2014), 38:1--38:55.Google Scholar
Y. Zheng, Y. Liu, J. Yuan, and X. Xie. 2011. Urban computing with taxicabs. In Proceedings of the 13th international conference on Ubiquitous computing, Beijing, China. 89--98 Google ScholarDigital Library

Index Terms

The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms
1. Information systems
  1. Information systems applications
    1. Data mining
    2. Spatial-temporal systems
      1. Geographic information systems

Recommendations

Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering
Abstract
Machine learning has proved to generate useful predictive models that can and should support decision makers in many areas. The availability of tools for AutoML makes it possible to quickly create an effective but complex predictive ...
Highlights
- Complex models are not always better.
- SAFE extracts interpretable features from ...
Read More
Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Unsupervised models do not require the defect data to build the prediction models and hence incur a low building cost and gain a wide application range. Consequently, it would be more desirable for practitioners to apply unsupervised models in effort-...
Read More
Limits of Data Value Predictability

The predictability of data values is studied at a fundamental level. Two basic predictor models are defined: Computational predictors perform an operation on previous values to yield predicted next values. Examples we study are stride value prediction ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature engineering
prediction
unit original taxi demands
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 200
  Total Citations
  View Citations
- 2,471
  Total Downloads
- Downloads (Last 12 months)128
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering

Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models

Limits of Data Value Predictability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering

Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models

Limits of Data Value Predictability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media