Prediction and optimization of sharing bikes queuing model in grid of Geohash coding

Abstract

Dockless bike-sharing systems provide parking anywhere feature and environment-friendly approach for commuter. It is booming all over the world. Different from dockless bike-sharing systems, for example, previous studies focus on rental mode and docking stations planning. Yet, due to the fact that human mobility patterns of temporal and geographic lead to bike imbalance problem, we modeled human mobility patterns, predicted bike usage, and optimized management of the bike-sharing service. First, we proposed adaptive Geohash-grid clustering to classify bike flow patterns. For simplicity and rapid modeling, we defined three queuing models: over-demand, self-balance, and over-supply. Second, we improved adaptive Geohash-grid clustering-support vector machine algorithm to recognize self-balance pattern. Third, based on the result of adaptive Geohash-grid clustering-support vector machine, we proposed Markov state prediction model and Poisson mixture model expectation-maximization algorithm. Based on data set from Mobike and OFO, we conduct experiments to evaluate models. Results show that our models offer better prediction and optimization performance.

Keywords

Bike-sharing systems Geohash coding grid state queuing model Markov chain expectation-maximization

Introduction

With the development of technology, dockless bike-sharing systems (BSSs) have been solved the last mile problem in intelligent city life.¹ BSSs are booming all over the world, especially in large cities. In the traditional self-service mode, users have to rent or return bike sharing at fixed stations. Based on mobile Internet, global positioning system (GPS), and location-based service (LBS), BSSs allow users to start or end service in community curbside, subway stations, and central business district (CBD) parking zone.

Since about 2015, the central problems for municipal administration to solve include acquiring space to park the bikes and achieve efficient use of the bikes. According to bike sharing park-anywhere feature, the core of the issue is focused on two factors:

Attribute to the human mobility patterns and spatiotemporal factor, the imbalance problem is difficult to model and predict in dockless BSS.

The phenomenon of “bike-sharing graveyard” takes place anywhere, for example, curbsides, which blocks the path of pedestrian.

This is a supply and demand planning problem that changes with temporal and geographic.² The truck-based³ and the user-based approaches are two baseline approaches to solve the bike imbalance issue.

However, the truck recycling approach depends on demand prediction and manual intervention.

Motivation and incitement

In dockless BSSs, bikes are widely used in our daily life. In recent study, many previous methods focus on demand prediction in dock BSSs. The researcher had implemented different strategies to address the occurrence of rebalance, such as sending cargo trucks to relocation bikes before rush hours. Due to lack of supervision and control strategy, the BSS is toward extreme phenomenon. That is too many illegal parking at curbside, bus station, community, and so on. The imbalanced usage pattern of bikes causes over-demand and over-supply issues not only to commuter but also to cities. Motivated by the aforementioned challenge, we examine three methods: station-centric model with global features,⁴ demand prediction,^5,6 and free-floating bike-sharing model.⁷

Based on the limitations of approach, our prediction model by adaptive Geohash-grid clustering (AGC) technology to preprocess parking coordinates data. We describe the stage of imbalance changed in different Geohash grids. Then, we propose a dockless sharing bike demand modeling: over-demand model, over-supply model, and self-balance model. Expectation-maximization (EM) algorithm is derived to learn the parameters of Poisson mixture model (PMM). According to queuing theory, we modeled bike usage patterns and human mobility patterns, predicted demand, and optimized the parameters.

We also showed that the prediction and optimization algorithms improve convergence and achieve a better performance compared with existing algorithms.

Literature review

This section summarizes research work on modeling, demand prediction, and parameter optimization. Mathematical modeling is the first step in solving prediction and optimization problems. From the viewpoint of BSS designers, route determination and demand prediction are relocated. Parameter value is a critical decision indicator for managers and controllers to optimize BSS. In our literature review, we concentrate on spatiotemporal, demand, and rebalance problems.

There are many researchers focusing on BSS modeling. Mathematical modeling⁸ have been used in recent work such as planning model,⁹ probability model,¹⁰ clustering algorithms,¹¹ loss function,³ and so on. Especially, the auto-regressive integrated moving average models¹² and auto-regressive moving average (ARMA)¹³ models are widely used in modeling of human mobility patterns. According to BSS stations usage profile, Sayarshad et al.⁹ proposed a multi-periodic optimization formulation for planning problems. Crisostomi et al.¹⁰ propose a Markov chain model. In the BSS, Markov decision process was a tool to solve commuter mobility patterns. Based on the discriminative functional mixture (DFM) model, Bouveyron et al.¹¹ proposed FunFEM methodology. To minimize the total cost, Hu and Liu³ proposed allocation model to solve bike rental stations and truck dispatching depot problems. According to travel patterns, MY Du et al.¹⁴ proposed a multinomial logit (MNL) model. In free-floating bike-sharing model of MNL consists of three categories, such as origin to destination pattern (ODP), travel cycle pattern (TCP), and transfer pattern (TP).

In order to redistribute the number of bikes, spatiotemporal analysis method is widely used in prediction algorithm. It is help to analyze a strategic design model for BSS. According to temporal and spatial factors, Yang and Hu¹⁵ proposed a spatiotemporal bicycle mobility model. The temporal and geographic mobility patterns are applied to demand prediction.¹⁶ Based on spatiotemporal analysis, Froehlich et al.¹⁷ proposed clustering technique in Barcelona’s BSSs.

From the viewpoint of bike-sharing operators, demand prediction is a critical performance indicator. We can predict the number of rental bikes according to cluster algorithm. Ciancia et al.² propose station occupancy predictor. This is a data mining framework to predict the occupancy levels of the stations by Bayesian and associative classifiers. Based on station usage, long-term stability, and short-term volatility, Yao et al.¹⁸ proposed three-step demand estimation model. However, tensor factorization is widely used in routing prediction of BSS. YX Li et al.¹⁹ proposed a hierarchical prediction model by tensor factorization to extract latent user activity patterns. Based on clustering algorithm to forecast bikes’ and docks’ availability for each station,²⁰ Gaussian mixture model (GMM)^21,22 and PMM²³ are common prediction schemes. LB Chen et al.²⁴ proposed dynamic cluster-based framework for demand prediction.

Optimization of resource allocation is necessary to improve system performance (e.g. bike usage patterns and rebalancing problem). Based on BSS optimization configuration, Ling et al.²⁵ developed a novel deep reinforcement learning algorithm called hierarchical reinforcement pricing (HRP) for optimized rebalance problem in BSS. Based on optimal facility allocation and pool sizing for BSSs,^26–28 Guoming and Lukasz²⁹ propose two bikeshare pool sizing techniques which guarantee bike availability with high probability. Based on resource of bike-sharing optimization, similar problems are single vehicle one commodity capacitated pickup and delivery problem (SVOCPDP),³⁰ one commodity pickup and delivery traveling salesman problem (1PDTSP),³¹ Swapping Problem,³² and split delivery problem.³³ These optimization algorithms above mentioned aim to find a minimum cost route for users to renting and returning bicycles. Mostly, split delivery problem and branch-and-cut algorithm are solved through a tabu search algorithm.³³ The SVOCPDP gathers aspects from both the Swapping Problem³² and the 1PDTSP.³¹ The objective of optimization is to find the costless function by heuristics algorithm. In the literature, artificial immune systems support vector machine (AIS-SVM),³⁴ artificial neural network (ANN)-SVM,³⁵ and particle swarm optimization methods have been widely applied in optimization and classification problems.

Currently, some papers adopt Monte Carlo simulation to predict the demand of cluster.²⁹ Simulation results are different from real results. In our review of the related literature, there is little research available regarding modeling and predicting user behaviors in dockless systems.² In this paper, we focus on predicting imbalance stage and optimized parameter in dockless BSSs. Based on the real-world bike-sharing data set from Beijing city, we combined Geohash coding and queuing theory approaches to improve the SVM and to optimize the parameters of the EM algorithm to adjust the weight value in PMM.

Contribution and paper organization

As is evident from the review, there is abundant research regarding BSSs. To tackle our problem, we carefully design solutions to overcome the above drawbacks of literature. Based on geographic-grid clustering, we proposed AGC approach preprocessing check-in/out from parking data set. Starting from parking anywhere and mobility point, we research on improved SVM (ISVM) classification and optimized EM parameter in queuing models.

Our main contributions are as follows:

For bike flow pattern parking-anywhere problem, we formally define the state of over-demand, over-supply, and self-balance by processing coordinate data, transition state modeling by queuing theory.

For demand prediction, we improved bi-classification algorithm to solve three-stage classification problem. Based on Markov state prediction (MSP), we propose AGC-SVM cope with dynamic demand.

For rebalance systems, we proposed PMM-EM model by PMM-EM algorithm.

The optimized parameters π_L and λ_L guarantee high probability of bike system running.

In order to validate our methods, we use real-world data under eight baseline methods, such as perceptron, decision tree, gradient boosting regression tree (GBRT), k-nearest neighbor (k-NN), k-means, ARMA, Kalman filter, and hidden Markov model (HMM). We adopt measure accuracy of classification and regression algorithm through root mean squared error (RMSE), root mean squared logarithmic error (RMSLE), error rate, precision, recall, and F1 to evaluate that our models outperform significantly.

This paper is organized as follows: in section “Overview,” we definitions our models, in subsection “framework,” we discussed bike flow pattern models in BSS. In section “Methodology,” we defined AGC to solve check-in/out dynamic demand. Based on three-stage classification problem, we improved SVM algorithm. We proposed PMM-EM model for imbalance stage prediction and rebalance parameter optimization. In section “Experiments,” we present experimental results to validate our method and discuss the merits and potential limitations of approaches, which give us guidance on when to use which model in practice. In section “Conclusion and future works,” we outline future research directions.

Overview

This section defines the notations (see Table 1) and terminologies used in this paper.

Table 1.

Notations of model parameters.

Notation	Description
M/M/1	It means that the system has a Poisson arrival process, an exponential service time distribution, and one server
M/M/S	The customer entering the system and following the distribution
AGC	adaptive Geohash-grid clustering
MSP	Markov state prediction
PMM	Poisson mixture model
M	Markovian
K	Capacity of queuing system or maximum queue size
S	The number of service windows
∞	System capacity and number of passengers
R_A/B/C	Geohash coding gird
p ^(A/B/C) ₀	Probability in queuing model
μ	Service rate in queuing model
λ	Arrival rate in queuing model
p	Utilization of the server = λ/μ
L	The number of sharing bike that waiting for be serviced in line
W _A/B/C	Waiting time in queuing model
Δt	Time window
$X^{out}$	Check-out value
$X^{in}$	Check-in value
$ε$	Threshold from $\| X^{out} - X^{in} \|$
α	Cost error in SVM
β	Geometric margin
ω	Vector data
π	Probability of PMM

Preliminary and problem definition

Then, we define the terms used in this paper as follows.

Definition 1: Geohash coding

Geohash is a public domain geocode system invented in 2008 by Gustavo Niemeyer. The Geohash encoding generated by latitude is stored in List 1, whereas that generated by longitude is stored in List 2. Lists 1 and 2 encodings are merged. The odd number positions denote the latitude, whereas the even number positions indicate the longitude. A total of 32 characters, namely, 0−9 and b−z (remove a, i, l, o), are used for the base 32 encoding, that is, List 3. The purpose of privacy protection is to publish different encoding lengths. For example, the 6-bit code can represent a range of approximately 0.34 km². A length of 7 bit can represent the range of 76 × 76 m², as shown in Table 2.

Table 2.

Bits and accuracy.

Coding length	lat bits	lng bits	km error
1	2	3	±2500
2	5	5	±630
3	7	8	±78
4	10	10	±20
5	12	13	±2.4
6	15	15	±0.61
7	17	18	±0.076
8	20	20	±0.019

Definition 2: Geohash grid

Geospatial index technique is a search method that efficiently deals with the roads, streets, and districts data. The grid index uses a hash data structure. Each grid corresponds to a bucket of the hash map (Figure 5).

Definition 3: bike flow patterns of check-in/out

In a given time window, [t, t + Δt] is defined as a tuple X_Δt = {Xⁱⁿ, X^out}, where $X_{i}^{in}$ and $X_{i}^{out}$ are the number of bikes’ start and end services from Geohash grid during [t, t + Δt], respectively. We define Xⁱⁿ and X^out as the start and end bike services, shown as $X^{in} {x_{1}^{in}, x_{2}^{in} . . . x_{n}^{in}}$ and $X^{out} {x_{1}^{out}, x_{2}^{out} . . . x_{n}^{out}}$ . Check-in/out values mean the number of bikes that activate or terminate sharing bike service in Δt, denoted as $x_{i Δ t}^{out}$ and $x_{i Δ t}^{in}$

\sum_{i = 1}^{n} Δ X_{i} = \sum_{i = 1}^{n} | x_{i}^{out} - x_{i}^{in} |

(1)

Definition 4: grid state

Sharing bikes is used by commuters who pick it up anywhere and anytime. According to statistic consequent from data set $X_{i}^{in}$ and $X_{i}^{out}$ , we find two states and one quasi-state. We define R_A grid state as over-demand, R_B grid state as over-supply, and R_C grid state as self-balance.

Definition 5: queuing theory

Queuing theory is the mathematical study of waiting lines or queues. In this paper, we explore queuing theory by modeling and analyzing the number of sharing bikes for service, waiting times, and so on. In Geohash gird, sharing bikes are considered customers and parking be defined as entering the queuing system. Queue represents customers or sharing bikes waiting for service. We propose three queuing models, described by Kendall’s notation (the standard system used to describe and classify a queuing node). For example: M/M/1/∞ describes over-demand stage, M/M/1/K describes over-supply stage, and M/M/S/K describes self-balance stage.

Definition 6: time window Δt

In dynamic patterns of bike sharing, we researched the range of time 6:00 a.m–24:00 p.m–01:00 a.m. It was divided into 36Δt (30 min per section) in training data set.

Problem Definition : imbalance and rebalance problems.

Framework

Based on bike flow pattern, we simulate the three classification results by queuing theory. Based on the fact that bike pattern of return and rental obeys Poisson process, we define three states convert to each other by bikes flow, as shown in Figure 1.

Figure 1.

Imbalance and rebalance dynamic model.

Label A: over-demand stage

This type of state in Geohash grid is named R_A. For example, bus station region, office building. Label R_A describes the model in over-demand stage. This stage means a little sharing bikes arriving at the parking area in a period of time, shown in the following equation

X^{out} - X^{in} > β

(2)

\sum_{i = 1}^{N} X_{i}^{out} - \sum_{i = 1}^{N} X_{i}^{in} > β

(3)

The capacity is ∞ in the sharing bike queuing system. The time interval of commuters arriving at R_A grid follows negative exponential distribution. We proposed over-demand R_A stage by M/M/1/∞ queuing theory. The working of the model is shown in Figure 2.

Figure 2.

Over-demand stage modeling.

Figure 2 shows how to model works.

In the model, the idle probability is

p^{(A)} 0 = \frac{1}{1 + \sum_{n = 1}^{K} ρ^{n}} = {\begin{matrix} \frac{1 - ρ}{1 - ρ^{K + 1}} (ρ \neq 1) \\ \frac{1}{K + 1} (ρ = 1) \end{matrix}

(4)

When K ≠ 1 and K ≠ 0. The number of customers waiting in line for bikes is obtained as

\begin{matrix} L_{A} = \sum_{n = 1}^{K} n p_{n} = p_{0} ρ \sum_{n = 1}^{K} n ρ^{n - 1} \\ = \frac{ρ}{1 - ρ} - \frac{(K + 1) ρ^{K + 1}}{1 - ρ^{K + 1}} \end{matrix}

(5)

Shown as (8): It means the systems are kept at over-demand stage.

Shown as (3): Following the changed bikes’ flow pattern, arrow (3) means $x_{i Δ t}^{out}$ < $x_{i Δ t}^{in}$ in BSS. The grid state is transformed from A stage to C stage.

Shown as (1): Following the growth of commuter terminated service, arrow (1) means $x_{i Δ t}^{out}$ < $x_{i Δ t}^{in}$ in BSS. The grid state is transformed from A stage to B stage.

Label B: Over-supply stage

Label R_B describes over-supply stage. This state means a lot of sharing bikes arriving at this region $x_{i Δ t}^{out}$ < $x_{i Δ t}^{in}$ in Δt, as shown in the following equation

X^{out} - X^{in} < - β

(6)

\sum_{i = 1}^{N} X_{i}^{out} - \sum_{i = 1}^{N} X_{i}^{in} < - β

(7)

For example, in subway station region, we named over-supply stage as R_B. The capacity of the R_B is K. we proposed over-supply R_B stage by M/M/1/K queuing theory, as shown in Figure 3.

Figure 3.

Over-supply stage modeling.

In this model, the idle probability of sharing bikes is

p^{(B)} 0 = \frac{1}{1 + \sum_{n = 1}^{\infty} ρ^{n}} = {(\sum_{n = 0}^{\infty} ρ^{n})}^{- 1} = 1 - ρ

p^{(B)} n = ρ^{n} p_{0} = (1 - ρ) ρ^{n} (n = 0, 1, 2 \dots)

(8)

\begin{matrix} L_{B} = \sum_{n = 0}^{\infty} n ρ_{n} = \sum_{n = 0}^{\infty} n (1 - ρ) ρ^{n} \\ = (ρ + 2 ρ_{2} + 3 ρ_{3} + \dots) - (ρ_{2} + 2 ρ_{3} + 3 ρ_{4} + \dots) \\ = \frac{ρ}{1 - ρ} = \frac{λ}{μ - λ} \end{matrix}

(9)

Shown as (9):It means that the systems are kept at over-supply stage.

Shown as (2): Following the changed bikes’ flow pattern, arrow (2) means $x_{i Δ t}^{out} > x_{i Δ t}^{in}$ . The state is transformed from B stage to C stage.

Shown as (6): Following the growth of commuter, arrow (6) means $x_{i Δ t}^{out} > x_{i Δ t}^{in}$ in BSS. The state is transformed from B stage to A stage.

Label C: Self-balance stage

R_C stage is self-balance. It means $x_{i Δ t}^{out} \approx x_{i Δ t}^{in}$ , as shown in the following equation

| X^{out} - X^{in} | ⩽ \pm β

(10)

| \sum_{i = 1}^{N} X_{i}^{out} - \sum_{i = 1}^{N} X_{i}^{in} | ⩽ \pm β

(11)

Different from defined R_A and R_B before, this queuing system have several service centers. The capacity of the R_C stage is K. We named self-balance stage as R_C stage by M/M/S/K queuing theory. In self-balance stage, the gird of R_C has high user density and high demand for sharing bikes. The working of the model is shown in Figure 4.

Figure 4.

Self-balance stage modeling.

In this model, the number of supplemented shared bikes is

μ^{(C)} n = {\begin{matrix} n μ (n = 1, 2, \dots s) \\ s μ (n = s, s + 1 \dots) \end{matrix}

(12)

The probability of n bikes in the platform is

p^{(C)} n = {\begin{matrix} \frac{ρ^{n}}{n!} p_{0} (n = 1, 2 \dots) \\ \frac{ρ^{n}}{s! s^{n - s}} p_{0} (n ⩾ s) \end{matrix}

(13)

The idle probability of sharing bikes is

p^{(C)} 0 = {[\sum_{n = 0}^{s - 1} \frac{ρ^{n}}{n!} + \frac{ρ^{s}}{s! (1 - ρ_{s})}]}^{- 1}

(14)

The mean number of waiting users is denoted by

L_{C} = L_{Cq} + s + p_{0} \sum_{n = 0}^{s - 1} \frac{(n - s) ρ^{n}}{n!}

(15)

Shown as (4): It means the systems are kept at self-balance stage.

Shown as (7): Following the changed bikes’ flow pattern, arrow (7) means $x_{i Δ t}^{out} > x_{i Δ t}^{in}$ in BSS. The state is transformed from C stage to A stage.

Shown as (5): Following the growth of commuter, arrow (5) means $x_{i Δ t}^{out} < x_{i Δ t}^{in}$ in BSS. The grid state is transformed from C stage to B stage.

Methodology

We formulate bike flow pattern as a Poisson process and modeled by queuing system. Based on predicted imbalance stage and optimized rebalance parameters, we improved AGC and AGC-SVM approaches. We proposed PMM-EM model in which key components of problems are described in the following subsection.

AGC

Geohash coding means longitude and latitude point data transform algorithm. It is a hierarchical spatial data structure that subdivides space into buckets with a grid shape. In order to find the statistic of bike, we proposed AGC. First of all, we use the Geohash coding algorithm to process parking data. Latitude and longitude coordinate data are transformed into grid layers on the map, as shown in Figure 5. For Geohash code tagging, we statistics $x_{i Δ t}^{out}$ and $x_{i Δ t}^{in}$ from every Geohash grid during Δt. Bike flow pattern is denoted as $\sum_{i = 1}^{n} x_{i}^{out}$ and $\sum_{i = 1}^{n} x_{i}^{in}$ in the Geohash grid. We process the training data set as follows

\begin{matrix} \sum_{i = 1}^{n} Δ X_{i} = \sum_{i = 1}^{n} x_{i}^{out} - \sum_{i = 1}^{n} x_{i}^{in} \\ = {Δ x_{1} (x_{1}^{out} - x_{1}^{in}), Δ x_{2} (x_{2}^{out} - x_{2}^{in}), \dots Δ x_{n} (x_{n}^{out} - x_{n}^{in})} \end{matrix}

(16)

Figure 5.

Adaptive Geohash-grid clustering.

A cluster consists of all density of parking points, for example, WX4EQY tagging over-supply stage and WX4EQV tagging over-demand stage.

With the same Geohash coding length of the prefix letter, we evaluated the statistical result by threshold parameters β and $\sum_{i = 1}^{n} Δ X_{i}$ relationship. Therefore, the statistical bikes’ flow problem and pattern recognition are done by AGC approach.

According to the number of bikes, we adjust Geohash-grid size by choosing suitable prefix lengths. Different periods have different usage stages in the same Geohash grid. The adaptive Geohash grid not only predicted imbalance stage but also protect privacy. In next subsection, we introduce stage label classification by ISVM algorithm.

ISVM label classification

In this subsection, we describe how to create stage label classification in training data. To predict three states of Geohash-grid bike flow pattern, we adopt three label stages in queuing models. Because of self-balance, it is a fuzzy stage. Classification of bike flow pattern is convex quadratic programming. Support vector machine is good at bi-classification problem, as the method can significantly reduce the need for labeled training instances. Intuitively, a good separation is achieved by hyperplane that has the largest distance to the nearest point in any labels classification from training data. The details of the ISVM classification algorithm are illustrated as follow:

Hyperplane of state-label classification

{\begin{matrix} ω^{T} x - β = 0; X_{label} \in R_{A} or R_{B} \\ ω^{T} x - β = 1; X_{label} \in R_{A} or R_{C} \\ ω^{T} x - β = - 1; X_{label} \in R_{C} or R_{B} \end{matrix}

(17)

Geometric margin

β = \frac{2}{‖ ω ‖}

(18)

Because of fuzzy stage boundaries and label classification, we improved SVM algorithm. This solution can be distinguished as “over-demand and self-balance” and “over-supply and self-rebalance” stages. The fundamental idea behind SVM is to choose the hyperplane with the maximum margin β, that is, the optimal canonical hyperplane. The geometric margin problem has become a convex minimization problem, as shown in Figure 6. We can obtain an equivalent formulation of minimizing $‖ ω ‖$ . Objective function is $Min 1 / 2 ‖ ω ‖^{2}$ , from Lagrange duality transition, dual variable is

L (ω, β, α) = \frac{1}{2} ‖ ω ‖^{2} - \sum_{i = 1}^{n} α_{i} (y_{i} (ω x_{i} - β) - 1)

(19)

Figure 6.

Fuzzy boundaries problem.

To do this, one needs to find the weight vector ω and the bias β that yield the maximum margin among all possible separating hyperplanes, the state is come from the hyperplane that maximizes

{\begin{matrix} Max \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j} . \\ s . t ., α_{i} ⩾ 0, i = 1 \dots, n \\ \sum_{i = 1}^{n} α_{i} y_{i} = 0 \end{matrix}

(20)

ω = \sum_{i = 1}^{n} α_{i} y_{i} x_{i}

(21)

Predicted imbalance stage

Based on Markov non-aftereffect and Bayes conditional probability formula, Markov transition matrix is a square matrix describing the probabilities of moving from one state to another in a dynamic system. During the whole day, we divide up several time windows Δt, Markov transient evolution is

P^{(1)} = π (0) P^{(0)}

P^{(n)} = P^{(1)} \times P^{(n - 1)}

(22)

For simplification and calculation of modeling, we named AGC-MSP. Different lengths of Geohash coding have different bike flow patterns toward AGC. AGC-MSP is a soft clustering method. It was used to predict imbalance stage and deployed bikes in advance. In application, these two scenarios for bike flow pattern prediction can be used complementarily as temporary stage scenarios and permanent state scenarios. We obtain the number of bike that come in and come out from Geohash grid. We process training data set that is recorded in the parking.

Markov state transition rate matrix formula

P (1, n) = [\begin{matrix} P_{11} & \dots & P_{1 n} \\ ⋮ & ⋱ & ⋮ \\ P_{n 1} & \dots & P_{nn} \end{matrix}]

(23)

Temporary stage scenarios

To answer the first question “which Geohash grid is imbalance state,” we first identified bikes’ flow check-out and check-in for each AGC. According to previous studies pointing out state label classification by threshold parameter β, we defined state table of transition matrix, as shown in Table 3.

Table 3.

Bikes and states data set in day 1.

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18
57	9	16	4	8	20	7	38	21	11	9	38	82	6	3	8	14	1
A	C	B	C	C	B	C	A	B	B	B	A	A	C	C	B	B	C
19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36
98	15	18	36	4	51	72	8	12	9	5	39	3	7	45	36	29	22
A	B	B	A	C	A	A	C	B	B	B	A	B	B	A	A	A	A

The R_A, R_B, or R_C state had been predicted by the Markov predicted approach. In practice, we get historical data {57, 9, –16, 4, 8, –20, 7, 38, –21, –11, –9, 38, 82, 6, 3, –8, –14, 1, 98, –15, –18, 36, 4, 51, 72, 8, –12, –9, –5, 39, –3, –7, 45, 36, 29, 22} in WX4G8C9V Geohash-gird. State labels are shown in Tables 3 and 4.

Table 4.

Part of state transition in day 2.

No.	2nd			17th			30th
State	A	B	C	A	B	C	A	B	C
Probability	0.538	0.152	0.3.7	0.386	0.333	0.279	0.365	0.352	0.279

Transient evolution is

P^{(1)} = P^{(0)} [\begin{matrix} P_{AA} & P_{AB} & P_{AC} \\ P_{BA} & P_{BB} & P_{BC} \\ P_{CA} & P_{CB} & P_{CC} \end{matrix}]

(24)

One step probability transfer matrix

\begin{matrix} P^{(1)} = [\begin{matrix} P_{11} & P_{12} & P_{13} \\ P_{21} & P_{22} & P_{23} \\ P_{31} & P_{32} & P_{33} \end{matrix}] \\ = [\begin{matrix} \frac{5}{12} & \frac{3}{12} & \frac{4}{12} \\ \frac{4}{14} & \frac{7}{14} & \frac{3}{14} \\ \frac{3}{9} & \frac{4}{9} & \frac{2}{9} \end{matrix}] \end{matrix}

(25)

Probability of initial state-space is

π (0) = {\frac{13}{36}, \frac{14}{36}, \frac{9}{36}}

Stable state scenarios

So far we have obtained the grid of state in different time periods. According to Markov state transition matrix, the grid of state is transferred to each other in same grid. The stable grid of state is determined after several iterations. For example, the coordinates are 116.402843 and 39.999375, and trans-form to Geohash coding is WX4G8C9V (19 × 19 m²), as shown in Figure 7.

Markov state transition rate matrix is

p^{(n)} = [\begin{matrix} 0.43 & 0.24 & 0.32 \\ 0.14 & 0.5 & 0.35 \\ 0.33 & 0.25 & 0.42 \end{matrix}]

(26)

{\begin{matrix} P_{A} = 0.43 P_{A} + 0.24 P_{B} + 0.32 P_{C} \\ P_{B} = 0.14 P_{A} + 0.5 P_{B} + 0.35 P_{C} \\ P_{C} = 0.33 P_{A} + 0.25 P_{B} + 0.42 P_{C} \end{matrix}

(27)

P_{WX 4 G 8 C 9 V} = {P_{A} = 0.4653, P_{B} = 0.3525, P_{C} = 0.1799}

Figure 7.

Grid state prediction.

We propose a prediction method to determine the final gird of state by Markov state transition matrix. Judging from state of Geohash grid, λ_A and β are the important threshold parameters. λ is the bike flow pattern with regard to data sets X_in and X_out and β is the geometric margin in SVM algorithm.

Rebalance parameter optimization

The bike flow pattern can be predicted by multiple times of Markov chain iteration in Geohash grid. In grid of R_A state, the strategy is allocation bike in the morning. In grid of R_B state, the strategy was recycled bike at night. Depending on operational experience and statistical summary, the deployed and recycled problems are uncertain.

Because of sharing the mobility of bikes, the three states of the grid alternate every day. To answer the second questions “how to cope with rebalance issue,” we proposed PMM-EM model. In different time, the grid state obeys Poisson distribution with different parameters. Therefore, how to optimize resource of deployed and recycled is rebalance issue. From observed parking data set, we can hardly point out the distribution exactly. Throughout the day, we define that the bike usage obeys Poisson mixed distribution. So that, the probability of sample X_i is

p (x, φ) = \sum_{L = 1} π_{L} P (x = k | λ_{k})

(28)

= {\begin{matrix} P_{A} (X = k) = \frac{λ_{A}^{k}}{k!} e^{- λ_{A}} \\ P_{B} (X = k) = \frac{λ_{B}^{k}}{k!} e^{- λ_{B}} \\ P_{C} (X = k) = \frac{λ_{C}^{k}}{k!} e^{- λ_{C}} \\ \sum_{L = 1}^{3} π_{L} = 1 \end{matrix}

(29)

φ = (π_{A,} π_{B}, π_{C}, λ_{A}, λ_{B}, λ_{C})

(30)

Among them, $π_{L}$ is the coefficient, where $π_{L} ⩾ 0$ , and $\sum_{L = 1}^{n} π_{L} = 1$ , where L is the l-th Poisson distribution.

Objectives function

{\begin{matrix} Maxlo g^{L (x_{i}, π, λ)} \\ S . T \sum_{L = 1} π_{L} = 1 \\ φ = ArgmaxQ \\ endcondition : L (φ^{(i)}) ⩾ L (φ^{(i + 1)}) \end{matrix}

(31)

In Poisson mixture distribution, estimated variables π and λ is

L (x_{i}, π, λ) = Π_{i = 1}^{36} p (x_{i}, φ)

(32)

\begin{matrix} Max \log^{L (x_{i}, π, λ)} = \sum_{i = 1}^{36} \log^{p (x_{i}, φ)} \\ = \sum_{i = 1}^{36} \log^{\sum_{l = 1}^{3} p (x_{i}, π_{L}, λ)} \\ = \sum_{i = 1}^{36} \log^{(π_{A} \frac{λ_{A}^{x_{i}}}{x_{i}!} e^{- λ_{A}} + π_{B} \frac{λ_{B}^{x_{i}}}{x_{i}!} e^{- λ_{B}} + π_{C} \frac{λ_{C}^{x_{i}}}{x_{i}!} e^{- λ_{C}})} \end{matrix}

(33)

When parameter π_A and λ_A in Poisson distribution

p (x_{i} = k) = π_{A} \frac{λ_{A}^{k}}{k!} e^{- λ_{A}}

(34)

Likelihood function

\begin{matrix} L (x_{1}, x_{2}, \dots x_{i}, φ) = L (x_{n}, φ) = L (x_{n}, π_{L}, λ_{L}) \\ = Π_{x_{i} = 1}^{36} {\sum_{L = 1}^{3} π_{L} p (x_{i} = k)} = Π_{x_{i} = 1}^{36} {(π_{A} \frac{λ_{A}^{x_{i}}}{x_{i}!} e^{- λ_{A}} + π_{B} \frac{λ_{B}^{x_{i}}}{x_{i}!} e^{- λ_{B}} + π_{C} \frac{λ_{C}^{x_{i}}}{x_{i}!} e^{- λ_{C}})} \end{matrix}

\begin{matrix} = {(π_{A} \frac{λ_{A}^{x_{1}}}{x_{1}!} e^{- λ_{A}} + π_{B} \frac{λ_{B}^{x_{1}}}{x_{1}!} e^{- λ_{B}} + π_{C} \frac{λ_{C}^{x_{1}}}{x_{1}!} e^{- λ_{C}}) \\ \times (π_{A} \frac{λ_{A}^{x_{2}}}{x_{2}!} e^{- λ_{A}} + π_{B} \frac{λ_{B}^{x_{2}}}{x_{2}!} e^{- λ_{B}} + π_{C} \frac{λ_{C}^{x_{2}}}{x_{2}!} e^{- λ_{C}}) \dots (π_{A} \frac{λ_{A}^{x_{36}}}{x_{36}!} e^{- λ_{A}} + π_{B} \frac{λ_{B}^{x_{36}}}{x_{36}!} e^{- λ_{B}} + π_{C} \frac{λ_{C}^{x_{36}}}{x_{36}!} e^{- λ_{C}}) \end{matrix}}

(35)

Log-likelihood function:

\begin{matrix} \log L (φ) = \log {Π_{j = 1}^{36} f_{j} (x_{j}, φ)} \\ = \sum_{j = 1}^{36} \log^{f_{j} (x_{j}, φ)} = \sum_{j = 1}^{36} \log^{\sum_{L = 1}^{3} π_{i} f_{j} (x_{j}, λ_{L})} \end{matrix}

(36)

If bike flow patterns obeys just one λ Poisson distribution in BSS

{\begin{matrix} \frac{\partial (\log L (φ))}{\partial π} = 0 \\ \frac{\partial (\log L (φ))}{\partial λ} = 0 \end{matrix}

(37)

Hidden variable γ

In the log-likelihood function, there is a sum in the logarithm. So add hidden variable γ;

if x_t from type of sample A, then γ_t,A = 1, γ_t,B = 0…, γ_{t, C} = 0 means (y_t, 1, 0, …, 0)

L (x_{i}, γ_{(i, A)}, γ_{(i, B)}, γ_{(i, C)} | π_{L}, λ_{L}) = Π_{L = 1}^{3} (π_{L} B (x_{i}; λ))^{γ_{i, L}}

= {(π_{A} B (x_{i}; λ))}^{γ_{i, A}} \times {(π_{B} B (x_{i}; λ))}^{γ_{i, B}}

\times (π_{C} B (x_{i}; λ))^{γ_{i, C}} = (π_{A} B (x_{i}; λ))^{1}

\times (π_{B} B (x_{i}; λ))^{0} \times (π_{C} B (x_{i}; λ))^{0}

(38)

we define implicit variable γ as a K-dimensional binary random variable with only a specific value in its K-dimensional value. When K value is 1, the other elements’ value is 0

p (γ_{L} = 1) = π_{L}

(39)

when $p (x_{1}) = \sum_{L = 1}^{3} π_{L} P_{i} (x_{1}, λ_{L})$

γ (i, K) = \frac{π_{k} f_{j} (x_{j}, λ_{k})}{\sum_{k = 1}^{3} π_{k} f_{j} (x_{j}, λ_{k})}

= {\begin{matrix} γ (1, A) = \frac{π_{A} f_{1} (x_{1}, λ_{k})}{\sum_{k = 1}^{3} π_{k} f_{1} (x_{1}, λ_{k})} \\ γ (1, B) = \frac{π_{B} f_{1} (x_{1}, λ_{B})}{\sum_{k = 1}^{3} π_{k} f_{1} (x_{1}, λ_{k})} \\ γ (1, C) = \frac{π_{C} f_{1} (x_{1}, λ_{C})}{\sum_{k = 1}^{3} π_{k} f_{1} (x_{1}, λ_{k})} \end{matrix}

(40)

Expectation step

Estimation of from which part of Poisson distribution the data x_i come from is as follows

\begin{matrix} EA (γ) = \frac{(γ (1, A) + γ (2, A) + \dots γ (18, A))}{36} \\ = \frac{γ_{1 A} + γ_{2 A} + \dots + γ_{18 A}}{36} \end{matrix}

(41)

Expectation is $E_{iL} (γ_{i}, L | x_{i}, π^{(i)}, λ^{(i)})$

= {\begin{matrix} γ_{A} (i, L) = \frac{π_{A} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{L} f_{i} (x_{i}, λ_{L})} \\ γ_{B} (i, L) = \frac{π_{B} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{L} f_{i} (x_{i}, λ_{L})} \\ γ_{C} (i, L) = \frac{π_{C} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{L} f_{i} (x_{i}, λ_{L})} \end{matrix}

(42)

{\begin{matrix} E_{A}^{(i)} (γ, L | x_{i}, π^{(i)}, λ^{(i)}) = \sum_{i = 1}^{36} (\frac{π_{A} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{k} f_{i} (x_{i}, λ_{L})}) \\ E_{B}^{(i)} (γ, L | x_{i}, π^{(i)}, λ^{(i)}) = \sum_{i = 1}^{36} (\frac{π_{B} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{k} f_{i} (x_{i}, λ_{L})}) \\ E_{C}^{(i)} (γ, L | x_{i}, π^{(i)}, λ^{(i)}) = \sum_{i = 1}^{36} (\frac{π_{C} f_{i} (x_{i}, λ_{L})}{\sum_{L = 1}^{3} π_{k} f_{i} (x_{i}, λ_{L})}) \end{matrix}

(43)

Maximization

Estimate the parametric proportions of each, as we know various possibilities of γ. In order to find out the proportion of Poisson distribution, we analyze and iterate maximizing Q function, and the most suitable parameter values are obtained to construct the optimization model

π^{(i + 1)}, λ^{(i + 1)} = \arg max Q (π, λ, π^{(i + 1)}, λ^{(i + 1)})

(44)

Q (φ^{(i)}, φ^{(i + 1)}) = \sum_{L = 1}^{3} \sum_{i = 1}^{36} γ (x_{i}; φ) {lo g^{π_{L}} + \dots lo g^{P (x_{i}, λ_{L})}}

(45)

Derivative of Q function and equal to zero.

Bike flow patterns

λ_{L}^{(i + 1)} = \frac{E_{1 L} x_{1} + E_{2 L} x_{2} + \dots + E_{36 L} x_{36}}{\sum_{i = 1}^{36} E_{iL} (γ_{i}, k | x_{i}, π^{(i)}, λ^{(i)})}

(46)

when

{\begin{matrix} λ_{A}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iA} x_{i})}{E_{A} (γ, k | x_{i}, π^{(i)}, λ^{(i)})} \\ λ_{B}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iB} x_{i})}{E_{B} (γ, k x_{i}, π^{(i)}, λ^{(i)})} \\ λ_{C}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iC} x_{i})}{E_{C} (γ, k x_{i}, π^{(i)}, λ^{(i)})} \end{matrix}

(47)

{π_{L}}^{(i + 1)} = \frac{E_{1 L} + E_{2 L} + \dots + E_{36 L}}{36}

(48)

when {\begin{matrix} π_{A}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iA})}{36} \\ π_{B}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iB})}{36} \\ π_{C}^{(i + 1)} = \frac{\sum_{i = 1}^{36} (E_{iC})}{36} \end{matrix}

(49)

The optimal value of parameter is found, as shown in Figure 8.

Set φ(π_A, π_B, π_C, λ_A, λ_B, λ_C) by EM algorithms

\hat{φ} = Arg (Min L (x_{1}, x_{2}, \dots x_{i}, φ))

(50)

Figure 8.

Three stages of clustering.

Experiments

Based on historical parking data, we predicted grid of state and optimized the parameters.

Settings

Our experiment data come from real-world database in Mobike and OFO technology Co., Ltd.

Mobike data: We use the data from Mobike in Beijing city, from 1 March 2017 to 30 October 2017 as the bike data. There are 4,825,118 records in train.csv and test.csv data sets. The format of data is user ID, bike ID, bike type, start time, end time, geo-coordinates start location, and geo-coordinates end location. We define them into a variable set {U_i, B_i, T_s, T_e, Ls, Ls_i (lat, lng), Le_i (lat, lng)}.

OFO data: We use the data of OFO BSS in Hangzhou city, from 1 May 2018 to 30 June 2018 as the bike data. There are 1,462,273 records. The format of the data is bike ID, start time, end time, and geographical coordinates of starting and ending location. We define them into a variable set {B_i, T_s, T_e, Ls, Ls_i (lat, lng), Le_i (lat,lng)}.

The experiments were executed on a computer running Windows 7, MATLAB 2014a, on a Pentium IV, Intel®, 1.84 GHz CPU, 4 GB of RAM. First, we use a statistics Spatiotemporal approach via time series analysis and get X_Δt = {Xⁱⁿ, X^out}. Second, we tagged the classification of the BSSs’ training data set. Finally, we predicted the cluster of Geohash grid.

Achievements

In this section, we experimentally evaluate the performance of AGC-MSP and PMM-EM in Geohash grid. AGC-MSP is predicted in Geohash-grid stage in next time slot.

Multi-grid scenarios

At first, we choose six district of Beijing to divide grid of Geohash. Dongcheng district is divided into 320 Geohash grids, Xicheng district is divided into 355 Geohash grids, Fengtai district is divided into 471 Geohash grids, Haidian district is divided into 1035 Geohash grids, Chaoyang district is divided into 1344 Geohash grids, and Changping district is divided into 1833 Geohash grids. We proposed AGC-MSP in stage prediction, as shown in Figure 9.

Figure 9.

Predicted imbalance stage.

Single-grid scenarios

The number of bikes is calculated by the function $X^{in} - X^{out}$ , where positive value is demand of deployed and negative value is redundancy of recycle, as shown in Tables 5 and 6.

Table 5.

Optimization result.

Geohash	1st	5th	11th	16th	19th	23rd	27th	30th	35th
wx4g46j	9	2	−5	−10	−13	−17	−8	20	13
wx4g1wb	8	−3	−9	15	15	18	13	−6	3
wx4g1uk	5	24	42	19	−6	13	20	14	17

Table 6.

PMM parameter result.

Initial value in WX4G7PT
Stage	R_A			R_B			R_C
Type	A	B	C	A	B	C	A	B	C
λ	17	5	3	11	4	2	16	14	1

R_A solution

K is the capacity in R_A queuing model system. When the capacity of the queues is K = 1, it means that one commuter searches for sharing bikes for traveling. Under over-demand stage, commuters could choose any transport tool in Geohash grid. In this system, the service rate is μ = 30 min/7 bikes = 4.2 min/bike. The bikes arriving rate is λ = 5 bikes/h and the service intensity is ρ = λ/μ = 1.19.

When the capacity of queues is K ≠ 1, for example, users waiting for a bike, the parameters are shown in Table 7.

Table 7.

Different number of commuter waiting for bike in R_A.

Parameters	K_commuter = 2	K_commuter = 3	K_commuter = 4
p ^(A) ₀	$\frac{1 - ρ}{1 - ρ^{K + 1}}$ = 0.277	$\frac{1 - ρ}{1 - ρ^{K + 1}}$ = 0.186	$\frac{1 - ρ}{1 - ρ^{K + 1}}$ = 0.137
p ^(A) _k-lost	ρ² $p_{0}^{(A)}$ = 0.399	ρ³ $p_{0}^{(A)}$ = 0.321	ρ⁴ $p_{0}^{(A)}$ = 0.28
L _A	$\frac{ρ}{1 - ρ} - \frac{(2 + 1) ρ^{2 + 1}}{1 - ρ^{2 + 1}}$ = 3.57	$\frac{ρ}{1 - ρ} - \frac{(K + 1) ρ^{K + 1}}{1 - ρ^{K + 1}}$ = 2.55	$\frac{ρ}{1 - ρ} - \frac{(K + 1) ρ^{K + 1}}{1 - ρ^{K + 1}}$ = 2.40
W _A	$\frac{L}{λ (1 - p_{A 2})}$ = 1.342	$\frac{L}{λ (1 - p_{A 3})}$ = 0.908	$\frac{L}{λ (1 - p_{A 4})}$ = 0.869

In the over-demand stage, queuing system service intensity is $ρ_{A}$ > 1 in grid R_A, the idle probability of sharing bikes is p^(A)₀ ≈ 20%, the probability of the users loss is p^(A)_k-lost ≈ 40% (K = 1, 2, 3…), the average of commuters is L_A ≈ 3, the wait time is W_A ≈ 1 h, the number of bikes cannot match the demand of users based on the strong paroxysmal behavior of commuters.

R_B solution

The over-supply state service intensity is 0 < $ρ_{B}$ < 1 in R_B state. The idle probability of bikes is p^(B)₀ = 60%. The commuters queuing line is 0 < L_B < 1.

Thus, commuter should not wait. In R_B state, there are several bikes offered to commuters to pick up in grid parking zone. The resources for bike sharing are in imbalance state. Thus, it is phenomenon of “bikes graveyard” by social news report.

In R_B state, the utilization rate of sharing bikes is inefficient. After several iterations, Markov transition matrix becomes steady-state. Based on the arrival rate λ according to a Poisson process, the number of check-in of bikes is μ_B = 20 bikes/h by the historical data. The remainder bikes are in idle state in R_B Geohash grid According to PMM-EM, the optimized parameter is λ_B = 8 bikes/h. The remainder bikes will be recycled by truck dispatching strategy.³ The intensity of the queuing model system is ρ_B = 0.4 < 1. The p^(B)₀ value is larger than the p^(B)₃ value. The result shows that the users are not required to wait, as shown in Table 8.

Table 8.

Model parameters in region R_B.

$p_{0}^{(B)}$	$p_{3}^{(B)}$	$p_{5}^{(B)}$	L_B	W_B
0.6	0.038	0.028	0.67	5 min

R_C solution

Table 9 shows that the bikes arrive at the parking platform following Poisson distribution.

Table 9.

Optimized value in WX4G7PT.

π	0.66	0.19	0.25	0.24	0.61	0.15	0.32	0.19	0.49
λ	29	4	1	6	8	3	11	7	6

In self-balance Geohash grid, the parameters are described as follows: the idle probability is p^(C)₀ = 17%, the waiting time for service is 0.09 h, the arrival rate λ_C = 7 bikes/h, the completed services rate μ_C ≈ 5.67, the intensity is ρ_C = 1.23, the number of users waiting in line is L_C = 1.24. When the number of counter is S_counter = 3/5/7, the parameter indicators are presented in Table 10.

Table 10.

R_C parameters indicators.

Parking platform	S = n = 3	S = n = 5	S = n = 7
Idle p⁽³⁾₀	0.1705	0.1701	0.1701
Probability of n p⁽³⁾_n	0.1929	0.162	0.102
Average number L₃	1.244	1.231	1.231

The occasional idle situation of sharing bikes in R_C causes the behavior of customer selection to become stable. No case of over-demand or over-supply was observed. The number of bikes can maintain stable self-balance.

Systems solution

Min P_{R - Adaptive} (p) = {\begin{matrix} R_{A} = MinL (p | ρ) Π_{i = 1}^{M} p_{k - lost}^{Ai} \\ R_{B} = Min Π_{i = 1}^{M} p_{0 - idle}^{Bi} \\ R_{C} = \frac{1}{2} \sum {Grid}_{Geohash}^{i} {(x_{in} - x_{out})}^{2} \end{matrix}

(51)

p_{k - lost}^{A} = (λ / π)^{K} p_{0}^{A}, p_{0}^{A} = \frac{1 - λ / μ}{1 - {(λ / μ)}^{K + 1}}

(52)

\begin{matrix} P_{0 - idle}^{Bi} = 1 - \frac{λ}{μ} \\ = 1 - (\frac{number of user / h}{1 / service hours}) \end{matrix}

(53)

Min C_{num} (x) = θ_{A} N_{A} - θ_{B} N_{B} + θ_{C} N_{C}

(54)

Bike rebalance strategy in grid R_A

N_{A} = N' + T (t ⩽ 36) π_{A} λ_{A} - T (t ⩽ 36) π_{B} λ_{B}

(55)

When

{\begin{matrix} N_{A} > 0; Put in \\ N_{A} = 0; Keep on \\ N_{A} < 0; Remove or stop \end{matrix}

(56)

where N_A is the optimized number of bike put in R_A and $N'_{A}$ is the initial quantity in R_A.

Bikes’ rebalance strategy in grid R_B

N_{B} = N' + π_{B} λ_{B} t - π_{A} λ_{A} t and t \in (t ⩽ 36)

(57)

When

{\begin{matrix} N_{A} > 0; Remove \\ N_{A} = 0; Keep on \\ N_{A} < 0; Putinor stop \end{matrix}

(58)

where N_B is the optimized recycle bikes from R_B, $N'_{B}$ is the initial quantity in R_B, and θ is the penalty factor.

BSS rebalance optimized object function is

\begin{matrix} F (X (x_{out}, x_{in}), T (t_{i}), π_{L}, λ_{L}, θ_{L} ω) = \\ {\begin{matrix} Min P_{R - Adaptive} (p) \sum_{L}^{3} R_{L} \\ Min C_{number} (x) = \sum_{L}^{3} θ_{L} N_{L} \\ \min \frac{{‖ ω ‖}^{2}}{2} \end{matrix} \end{matrix}

(59)

Result

In the experiments, two models are designed as the classification and prediction frameworks. For the result on bike check-in/out, we compared famous baseline clustering and classify method perceptron, decision tree, k-NN, and k-means in bike-sharing service region. For the result on imbalance problem, we compared AGC-MSP with baseline method: ARMA and GBRT in predicted demand. For the result on rebalance problem, we compared PMM-EM with Kalman filter and HMM in parameter optimization.

Baselines and evaluation method

The models are proposed in our work to solve imbalance and rebalance problems by bike flow patterns prediction and parameter optimization. The methods are ISVM algorithm to solve multi-classification problem and AGC algorithm, based on AGC-MSP and PMM-EM. In order to confirm our models, there are eight approaches that can be compared with the proposed method as follows:

Perceptron: the perceptron is an algorithm for supervised learning of binary classifiers.

Decision tree: rules based on variable values are selected to get the best split to differentiate observations based on the dependent variable. Tree models where the target variable can take a discrete set of values are called classification trees.

GBRT: gradient boosting regression tree.

k-NN: k-nearest neighbor algorithm output depends on whether k-NN is used for classification or regression. In k-NN regression, the output is the property value for the object. This value is the average of the values of k-NNs.

Geographical grid: It means that the city was divided into several grids.

k-means: k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

ARMA: This method is used to predict the stage of bikes in future time series. It helps to find imbalance stage based on the result.

Kalman filter: It is an algorithm that uses a series of measurements observed over time.

HMM: In HMMs, the state is directly visible to the observer, and therefore, the state transition probabilities are the only parameters. The state is not directly visible, but the output dependent on the state is visible.

In order to evaluate the realistic scenario of BSS, the test data set is divided two parts: sequential hour slots and anomalous time slots.

In this paper, we proposed AGC, ISVM, AGC-MSP, and PMM-EM approaches compared with six baselines method. AGC is based on density-based clustering, ISVM is based on classification algorithm, AGC-MSP is based on classification and regression analysis, and PMM-EM is based on machine learn.

Evaluation metrics: To evaluate the performance of method, we adopt RMSE, RMSLE, error rate, precision, recall, and F1 which are widely used to measure accuracy of classification and regression algorithm

\begin{matrix} Precision = \frac{| TP |}{| TP | + | FP |}; \\ recall = \frac{| TP |}{| TP | + | FN |} \end{matrix}

(60)

F 1 - score = 2 \frac{precision \times recall}{precision + recall}

(61)

RMSE = \frac{\sum_{i = 1}^{n} \sqrt{{({x'}_{in} - x_{in})}^{2} + {({x'}_{out} - x_{out})}^{2}}}{n}

(62)

RMSLE = \sqrt{\frac{\sum_{i = 1}^{n} {(lo g^{({x'}_{G_{i}} + 1)} - lo g^{(X_{G_{i}} + 1)})}^{2}}{n}}

(63)

$x'_{in}$ and $x'_{out}$ are the bikes of the check-in and check-out from Geohash-grid cluster $G_{i}$ during Δt

Error rate = \frac{\sum_{i = 1}^{n} | x'_{G_{i}} - X_{G_{i}} |}{\sum_{i = 1}^{n} X_{G_{i}}}

(64)

where $X_{G_{i}}$ are the bikes of the check-in or check-out from Geohash-grid cluster $G_{i}$ during Δt.

In machine learning, clustering is unsupervised learning approach. It is difficult to evaluate the result of clustering. What is clusters number n? We usually chose the number of clusters by knowledge and experience. In our experiments, we cluster all parking points separately in Mobike and OFO BSSs.

Result on clustering

For simplicity and rapid modeling, we have been inspired by the most popular method of density-based spatial clustering of applications with noise (DBSCAN) and OPTICS. In the bike flow pattern clustering, we apply the idea of AGC in state label renewal. According to the number of bikes’ check-in/out, we compare k-means (k = 3) and geographically constrained label propagation (GCLP) with the geographical grid.³⁶

When the demand of imbalance problem in BSS is predicted, we should be able to cluster bikes’ check-out/in regions. k-means is the baseline clustering algorithm by distance of object. But in dockless pattern of bike sharing, the point of parking is changing with Δt. GCLP considers geographic constraint and label propagation based on popular community detection algorithm. Table 11 reveals that the time complexity is better than GCLP. Therefore, we adopted AGC for the next step.

Table 11.

Complex of clustering.

Method	Time complexity
AGC	O( $n / c$ )
Geographical grid	O(c)
k-means	O( $nlogn$ )
GCLP	O( $n^{2}$ )

AGC: adaptive Geohash-grid clustering; GCLP: geographically constrained label propagation.

Result on label classification

In order to figure out the stage of bike flow pattern for each region (grid or dock station), the initial parking data will be transformed to { $x_{i Δ t}^{out}$ , $x_{i Δ t}^{in}$ } by Δt. After preprocess, we adopt simply bi-classification method. To make it fair, we change multi-classifiers to binary classifiers. To evaluate the rebalance by parameter optimization, RMSE is evaluated as the prediction performance. In this paper, k = 2 is (R_A and R_C) or (R_B and R_C) binary classifiers problem and k = 3 is multi-classifiers problem. We used ISVM to compare with regression tree, k-NN, perceptron, and k-means. Figure 10 shows that when k = 2, perceptron is famous with high accuracy and when k = 3, our ISVM is very suitable for dealing with multi-classification problems.

Figure 10.

Label of static classified.

Result on predicted patterns

We proposed AGC-MSP compared with ARMA, ANN, and Kalman filter predicted algorithm.

In other words, the proposed AGC-MSP based on density-based clustering and Markov models can improve the prediction performance. The predicted over-demand state results are shown in Table 12.

Table 12.

True and false positive.

Prediction	Truth
	Over-demand	Over-supply
Over-demand	True positive	False positive
Over-supply	False negative	True negative

Based on the AGC-MSP results, we predict the check-out/in stage by ARMA and GBRT. We choose two type of stages for AGC in our experiment: one is over-supply stage and the other is over-demand stage. The performances of ARMA are much better than GBRT. In addition, in all the hours, GBRT is less affected by time factor. We proposed that AGC-MSP is more accurate than ARMA and GBRT obviously, as shown in Figures 11 –13. Because of sample of time window, the accuracy of AGC-MSP depends on stable Δt and prior knowledge for historical data.

Figure 11.

Prediction of precision.

Figure 12.

Prediction of recall.

Figure 13.

Prediction of F1.

Result on parameter optimization

We proposed PMM-EM’s parameter optimization compared with HMM and Kalman filter. In anomalous time series, PMM-EM performance was calculated using RMSLE and error rate.

In summary, PMM-EM based on the result of AGC, ISVM, and Markov state transition matrix is much better than baseline method. In time complexity, the PMM-EM model need to achieve stable transition by initialized values iterated as $π_{L}$ and $λ_{L}$ . Like pre-classified items in clustering, these sets are often created by expert human. Fortunately, we proposed queuing model, in which the bikes’ flow obey Poisson distribution. We can choose λ_L by historical data. Based on prior knowledge, we can find the optimal parameter in the convergence state as quickly as possible. Base on PMM-EM algorithms, the performance of RMSLE is 0.349 in OFO dataset. The result of Mobike dataset 0.371>0.349, Therefore, in OFO dataset, PMM-EM performs is better than Mobike dataset result, show as in Tables 13 and 14.

Table 13.

Demand of bikes’ check-out Geohash grid.

WX4G46J	All hours sequential data				Anomalous hours interval data
Metric	RMSLE		Error		RMSLE		Error
Company	Mobike	OFO	Mobike	OFO	Mobike	OFO	Mobike	OFO
HMM	0.387	0.372	0.439	0.451	0.353	0.355	0.453	0.489
PMM-EM	0.371	0.349	0.421	0.407	0.288	0.282	0.351	0.347
Kalman filter	0.386	0.369	0.423	0.425	0.311	0.314	0.371	0.375

RMSLE: root mean squared logarithmic error; HMM: hidden Markov model; PMM-EM: Poisson mixture model expectation-maximization.

Table 14.

Demand of bikes and check-in into Geohash grid.

WX4G46J	All hours sequential data				Anomalous hours interval data
Metric	RMSLE		Error		RMSLE		Error
Company	Mobike	OFO	Mobike	OFO	Mobike	OFO	Mobike	OFO
HMM	0.624	0.653	0.689	0.701	0.681	0.671	0.834	0.835
PMM-EM	0.365	0.350	0.408	0.402	0.297	0.290	0.353	0.340
Kalman filter	0.384	0.373	0.425	0.419	0.335	0.302	0.365	0.359

RMSLE: root mean squared logarithmic error; HMM: hidden Markov model; PMM-EM: Poisson mixture model expectation-maximization.

Conclusion and future works

Bike sharing is a means of transportation that provides services to residents through mobile Internet, LBS, e-commerce, and other technologies. With the development of the market and the increasing number of customer groups, a huge amount of data has been generated.

Different researcher have different attitude toward BSSs. With the development of deep learning and ANN, the traffic dispatching that has become hot spot in International conference. However, those algorithms are not suitable for our framework. The main reason is perspective of feature and factor. For instance, genetic algorithm, simulated annealing, and the heuristic algorithm are well performance in nondeterministic polynomial time (NP)-hardness problem, but our approach focus on global features. In future research, we should address the problem of heuristic algorithm, such as particle swarm optimization optimized BSS. Besides the optimization, deep learning differential privacy is a newest research area. Therefore, commuter should be prevented from being tracked and intercepted. Large-scale data sets are shared, and the law of user behavior is examined. The overall arrangement of sharing bikes is continuously optimized. It provides a reference for traffic planning of urban public service and alleviates urban congestion. It provides a business siting scheme which pushes media advertising and recommends commodities.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key R&D Program of China under Grant No. 2017YFB1400800 and National Natural Science Foundation of China under Grant No. 61702046.

ORCID iD

Kui Yu

References

Kaufman

Gordon-Koven

. Citi bike: the first two years. UTC J 2015; 24: 17–25.

Ciancia

Latella

Massink

, et al. Exploring spatiotemporal properties of bike sharing systems. In: IEEE international conference on self-adaptive & self-organizing systems workshops, Cambridge, MA, 21–25 September 2015. New York: IEEE.

Liu

. An optimal location model for a bicycle sharing program with truck dispatching consideration. In: IEEE international conference on intelligent transportation systems, Qingdao, China, 8–11 October 2014.

Zeng

Wang

, et al. Improving demand prediction in bike sharing system by learning global features. In: Machine learning for large scale transportation systems (LSTS) @ KDD-16, San Francisco, CA, 13 August 2016.

Chen

Zhang

Wang

, et al. Dynamic cluster-based over-demand prediction in bike sharing systems. In: UbiComp ‘16 proceedings of the 2016 ACM international joint conference on pervasive and ubiquitous computing, vol. 14, Heidelberg, 12–16 September 2016, pp. 841–852. New York: ACM.

Chang

, et al. Bike sharing demand prediction using artificial immune system and artificial neural network. Soft Comput 2017; 1: 1–14.

Pal

Zhang

. Free-floating bike sharing: solving real-life large-scale static rebalancing problems. Transport Res C: Emer 2017; 80: 92–116.

Lin

J-R

Yang

T-H

. Strategic design of public sharing bikes systems with service level constraints. Transport Res E: Log 2011; 47(2): 284–294.

Sayarshad

Tavassoli

Zhao

. A multi-periodic optimization formulation for bike planning and bike utilization. Appl Math Model 2012; 36(10): 4944–4951.

10.

Crisostomi

Faizrahnemoon

Schlote

, et al. A Markov-chain based model for a bike-sharing system. In: 2015 international conference on connected vehicles and expo (ICCVE), Shenzhen, China, 19–23 October 2015. New York: IEEE.

11.

Bouveyron

Côme

Jacques

. The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann Appl Stat 2016; 9(4): 1726–1760.

12.

Groschwitz

Polyzos

. A time series model of long-term NSFNET backbone traffic. In: Proceedings of the IEEE international conference on communications, ICC’94, New Orleans, LA, 1–5 May 1994, pp. 1400–1404. New York: IEEE.

13.

Box

Pelham

Jenkins

. Time series analysis, forecasting and control. 3rd ed. Upper Saddle River, NJ: Prentice Hall, 1990.

14.

Mingyang

Lin

Better Understanding the Characteristics and Influential Factors of Different Travel Patterns in Free-Floating Bike Sharing: Evidence from Nanjing, China[J]. Sustainability, 2018; 10(4): 1244–1258.

15.

Yang

. Mobility modeling and prediction in bike-sharing systems. In: MobiSys ‘16 proceedings of the 14th annual international conference on mobile systems, applications, and services, vol. 18, Singapore, June 2016, pp. 165–178. New York: ACM.

16.

Kaltenbrunner

Meza

Grivolla

, et al. Urban cycles and mobility patterns: exploring and predicting trends in a bicycle-based public transport system. Pervasive Mob Comput 2010; 6(4): 455–466.

17.

Froehlich

Neumann

Oliver

, et al. Sensing and predicting the pulse of the city through shared bicycling. In: International joint conference on artificial intelligence, Pasadena, CA, 11–17 July 2009, pp. 1420–1426. New York: ACM.

18.

Yao

Shen

, et al. Demand estimation of public bike-sharing system based on temporal and spatial correlation. In: 2018 4th international conference on big data computing and communications (BIGCOM), Chicago, IL, 7–9 August 2018.

19.

Ye xin

Zheng

. Traffic Prediction in a Bike-Sharing System. SIGSPATIAL ’15 Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems 2015; 4:156–163.

20.

Zhong

Chen

, et al. SAFEBIKE: a bike-sharing route recommender with availability prediction and safe routing. In: Conference’17, Washington, DC, July 2017.

21.

Cai

Xue

Mao

, et al. Bike-sharing prediction system. In: El Rhalibi

Tian

Pan

, et al. (eds) E-learning and games: Lecture notes in computer science, vol. 9654. Cham: Springer, 2016, pp. 301–317.

22.

Froehlich

Neumann

Oliver

. Measuring the pulse of the city through shared bicycle programs. In International workshop on urban, community, and social applications of networked sensing systems (UrbanSense 08), Raleigh, NC, 4 November 2008, pp. 16–20, https://www.cs.umd.edu/~jonf/publications/Froehlich_MeasuringThePulseOfTheCityThroughSharedBicyclePrograms_UrbanSense2008.pdf

23.

Come

Latifa

. Model-Based Count Series Clustering for Bike Sharing System Usage Mining: A Case Study with the Velib’ System of Paris[J]. Acm Transactions on Intelligent Systems, 2014, 5(3):39.1–39.21.

24.

Long Biao

Chen

Zh.Dynamic Cluster-Based Over-Demand Prediction in Bike Sharing Systems. UbiComp ’16 Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. Heidelberg 2016; 14: 841–852. Germany: ACM.

25.

Ling

Qing

Rebalancing Dockless Bike Sharing Systems. Neural Computing & Applications, 2018, 23: 59–63.

26.

Lin

Yang

Chang

. A hub location inventory model for sharing bikes system design: formulation and solution. Comput Ind Eng 2013; 65(1): 77–86.

27.

Cherry

Han

, et al. Electric bike sharing: simulation of user demand and system availability. J Clean Prod 2014; 85: 250–257.

28.

Alvarez-Valdes

Belenguer

Benavent

, et al. Optimizing the level of service quality of a bike-sharing system. Omega 2016; 62: 163–175.

29.

Guoming

Lukasz

. Bikeshare pool sizing for bike-and-ride multimodal transit. IEEE T Intell Transp 2018; 14: 5–21.

30.

Chemla

Meunier

Wolfler

. Bike sharing systems: solving the static rebalancing problem. Discrete Optim 2013; 10(2): 120–146.

31.

Hernandez Pérez

Salazar González

. A branch-and-cut algorithm for a traveling salesman problem with pickup and delivery. Discrete Appl Math 2004; 145: 126–139.

32.

Anily

Hassin

. The swapping problem. Networks 1992; 22: 419–433.

33.

Archetti

Speranza

Hertz

. A tabu search algorithm for the split delivery routing problem. Transport Sci 2006: 40: 64–73.

34.

Szymczyk

. Classification of geological structure using ground penetrating radar and Laplace transform artificial neural networks. Neurocomputing 2015; 148(19): 354–362.

35.

Louppe

Geurts

. Ensembles on random patches. In: Machine learning and knowledge discovery in databases, Bristol, 23–27 September 2012, pp. 346–361. Berlin: Springer.

36.

El-Assi

Mahmoud

. Effects of built environment and weather on bike sharing demand: a station level analysis of commercial bike sharing in Toronto. Transportation 2018; 44(3): 589–613.

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18
57	9	16	4	8	20	7	38	21	11	9	38	82	6	3	8	14	1
A	C	B	C	C	B	C	A	B	B	B	A	A	C	C	B	B	C
19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36
98	15	18	36	4	51	72	8	12	9	5	39	3	7	45	36	29	22
A	B	B	A	C	A	A	C	B	B	B	A	B	B	A	A	A	A

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18
57	9	16	4	8	20	7	38	21	11	9	38	82	6	3	8	14	1
A	C	B	C	C	B	C	A	B	B	B	A	A	C	C	B	B	C
19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36
98	15	18	36	4	51	72	8	12	9	5	39	3	7	45	36	29	22
A	B	B	A	C	A	A	C	B	B	B	A	B	B	A	A	A	A

Prediction and optimization of sharing bikes queuing model in grid of Geohash coding

Abstract

Keywords

Introduction

Motivation and incitement

Literature review

Contribution and paper organization

Overview

Preliminary and problem definition

Definition 1: Geohash coding

Definition 2: Geohash grid

Definition 3: bike flow patterns of check-in/out

Definition 4: grid state

Definition 5: queuing theory

Definition 6: time window Δt

Framework

Methodology

AGC

ISVM label classification

Predicted imbalance stage

Temporary stage scenarios

Stable state scenarios

Rebalance parameter optimization

Objectives function

Likelihood function

Hidden variable γ

Expectation step

Maximization

Experiments

Settings

Achievements

Multi-grid scenarios

Single-grid scenarios

Recommended solution strategies

RA solution

RB solution

RC solution

Systems solution

Result

Baselines and evaluation method

Result on clustering

Result on label classification

Result on predicted patterns

Result on parameter optimization

Conclusion and future works

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References

R_A solution

R_B solution

R_C solution

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18
57	9	16	4	8	20	7	38	21	11	9	38	82	6	3	8	14	1
A	C	B	C	C	B	C	A	B	B	B	A	A	C	C	B	B	C
19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36
98	15	18	36	4	51	72	8	12	9	5	39	3	7	45	36	29	22
A	B	B	A	C	A	A	C	B	B	B	A	B	B	A	A	A	A