Abstract
I- INTRODUCTION
II- PROBLEM STATEMENT
III- RELATED WORK
IV- THE COLLEGE ADMISSION ALGORITHM
V- THE PROPOSED ALGORITHM
VI- DATASETS
VII- CLUSTERING RESULTS AND DISCUSSION
VIII- CONCLUSION
REFERENCES
Abstract
This paper proposes a novel supervised clustering algorithm to analyze large datasets. The proposed clustering algorithm models the problem as a matching problem between two disjoint sets of agents, namely, centroids and data points. This novel view of the clustering problem allows the proposed algorithm to be multi-objective, where each agent may have its own objective function. The proposed algorithm is used to maximize the purity and similarity in each cluster simultaneously. Our algorithm shows promising performance when tested using two different transportation datasets. The first dataset includes speed measurements along a section of Interstate 64 in the state of Virginia, while the second dataset includes the bike station status of a bike sharing system (BSS) in the San Francisco Bay Area. We clustered each dataset separately to examine how traffic and bike patterns change within clusters and then determined when and where the system would be congested or imbalanced, respectively. Using a spatial analysis of these congestion states or imbalance points, we propose potential solutions for decision makers and agencies to improve the operations of I-64 and the BSS. We demonstrate that the proposed algorithm produces better results than classical kmeans clustering algorithms when applied to our datasets with respect to a time event. The contributions of our paper are: 1) we developed a multi-objective clustering algorithm; 2) the algorithm is scalable (polynomial order), fast, and simple; and 3) the algorithm simultaneously identifies a stable number of clusters and clusters the data.
INTRODUCTION
WITH the growth of new technologies, smart cities and urban areas are adapting advanced devices to control and monitor transportation networks and thus provide better service to the public and private sectors. These devices collect data through many sensors in the city’s infrastructure. Agencies and researchers exploring the massive amounts of collected data often find it challenging to draw meaningful conclusions due the sheer size of the datasets. One way to deal with such data is to use clustering approaches. In the transportation field, operating agencies (such as departments of transportation) have been collecting data to improve the efficiency of the transportation network and provide a better service for all transportation modes. Clustering the travel times or speeds of transportation modes could help operating agencies to better manage the transportation network. In particular, the collected data could be reduced to find the cluster centroids (i.e., the means of the clusters) that represent the entire data with respect to a time event such as time of day, day of month, and month of the year. This could help operating agencies answer several questions related to traffic operations such as, “Can we discriminate between recurrent congestion and outliers?” and “Can we identify how many time periods we need to plan for in terms of resource and congestion management?”