Order Classification

A classification problem was also set up from the given data, this section provides an overview of the models used and the results obtained.

Each order was assigned a postal code based on Google map’s reversegeocode API that allows to estimate the location of an entity given a point (lattitude, longitude).

Orders where the VENUE and USER share a common postal code is assigned the class IN to imply the order has originated from the code. Otherwise a class OUT is assigned to an order to indicate the ORDER is received from outside the postal code. The table below indicates most orders tend to originate outsided a postal code region, the class imbalance is significant.

	IN	OUT
ORDERS	3983	14446
(%)	21.6%	78.4%

A further evaluation of the averages of various features when stratified by these classes indeed does indicate orders that are out the the postal region tend to be 1.15Km away while those within a postal region tend to be 560 meters away.

FEATURE	IN (mean)	OUT (mean)
EST_ACT_Diff	-1.075320	-1.242697
ITEM_COUNT	2.749937	2.670566
ESTIMATED_DELIVERY_MINUTES	31.217926	34.554340
ACTUAL_DELIVERY_MINUTES	30.142606	33.311643
CLOUD_COVERAGE	12.410244	11.882874
TEMPERATURE	16.994502	16.967756
WIND_SPEED	3.781489	3.793611
PRECIPITATION	0.320309	0.342568
USER_VENUE_DIST	0.558221	1.152561

Classification Models

The classification models applied

Logistic regression model
Support Vector Machine (SVM)
Decision Tree (still under evaluation)

The linear regression model was cross validated with 10 folds with 10 repetitions while the SVM model was cross validated using 5 folds with 3 repetitions each (performance constraints at higher split-repeations)

The image below shows a comparison of the accuracy levels between the logistic regression and SVM models.

lr_svm_acc

Wolt Orders

This project is an analysis of order data from Wolt - a food deliver platform

Order Classification

Classification Models