Difference in time taken for a ride in the rush hours of the working days to the other hours in the day will be statistically established their resu

Download Solution Now

Difference in time taken for a ride in the rush hours of the working days to the other hours in the day will be statistically established their result.

based on this I need the areas in new York that have high peak time. We need the areas represented in a graph that show the peak times of different areas

our graph will represent different areas based on their peak travel time.

y- peak travel time

x- area name

we will analyze using different machine learning models such as the Kalman filter, Kaplan-Meier model, Conditionally Gaussian Observed Markov Fuzzy Switching Model, survival analysis, and Hidden Markov Model

negative binomial and poison distribution.

Time Series analysis

A Review of Statistical Analysis in Traffic Prediction

Nikhil Teja Bibinagar - 2859717

Sai Krishna Dumpala - 2860098

Civil Engineering Department, Cleveland State University

CVE 593 - Special Topics In Civil Engineering

Kiddo Emmanual10/30/2022

TABLE OF CONTENTS: -

Abstract

Definition of Some Terms

Problem Statement

Topic Background

NYC Traffic Congestion

Scope Of The Paper

Method

A Review

8.1 A measure of Traffic Congestion

Literature Review of Prediction Models

9.1 Basic Terminology of Deterministic Models

9.2 Basic Terminology of Statistical Models

9.3 Machine Learning Decision-Making Models

Related Case Studies

Conclusions

References

AbstractIn today's world, Major cities are getting clogged because of excessive traffic, which has become a significant problem in highly developed cities. This paper focuses on analyzing the current situation in New York City and coming up with an alternative model like statistical and deterministic analysis of the earlier data to reduce these types of traffic problems. For this study, we searched many research websites like Google Scholar, JSTOR, Elsevier, and Research Gate and obtained relevant material. We went through NYC TLC data from January 2022 to July 2022 to assess and improve the traffic conditions on the roadway. Additionally, this research paper went through various traffic prediction models, which include the Kalman filter, Kaplan-Meier model, Conditionally Gaussian Observed Markov Fuzzy Switching Model, survival analysis, and Hidden Markov Model, which were developed and employed for different environments. The entire study will be based on NYC Taxi Limousine Commission (TLC) to develop a better alternative model.

Keywords: - Traffic Congestion, Kalman filter, Kaplan-Meier model, Conditionally Gaussian Observed Markov Fuzzy Switching Model, survival analysis, Hidden Markov Model, Taxi Limousine Commission (TLC)

Definition of Some Terms:

Google Scholar This provides simple ways to search scholarly literature broadly. It is a bibliographic database owned by google.

JSTOR (Journal Storage ) is an e archive of leading scholarly journals from various academic disciplines. It is a part of ITHAKA, a non-profit organization that includes Artstor, Ithaka S + R, and Portico.

Elsevier:It is a leading information and analytics provider for various customers. It is a Netherland-based academic publishing company specializing in scientific, technical, and medical companies.

Research gate:This is one of the websites where 20 million research journals from various platforms and backgrounds are stored. It is a European-based social networking site for scientists and researchers.

Traffic Congestion:It refers to excessive vehicles on a particular roadway portion.

Congestion Index:The number of vehicles divided by the Area will give us the Congestion Index. It measures the Density of a particular area in terms of Vehicles.

Urbanization:It is the process where people migrate to particular areas where there is a central business district where there are better employment chances.

Transportation Network:It is a simple combination of buses, cars, trains, and other vehicles.

Traffic Speed The number of vehicles passing a given point at a given time.

Traffic volume The number of vehicles passing a given point at a given time.

Traffic level also known as Level of Service- is a qualitative measure used to relate motor vehicle traffic service quality.

Roads Clogged The situation where the vehicles cant move further due to a traffic jam.

Texas Transportation Institute ( TTI ) -It is an agency of the state of Texas and a member of the Texas A & M University, which addresses complex transportation Challenges and opportunities.

Traffic Delay The increased waiting time by the driver or pedestrian than regular hours.

Federal Highway Administration ( FHWA ) - It provides stewardship over the construction, maintenance, and preservation of Highways, Bridges & Tunnels.

Road Accident:It is an incident where an automobile collides with another automobile and causes damage to humankind.

Taxi Limousine Commission ( TLC ) It is an agency responsible for Licensing owned by New York City Government.

Peak Hours The busiest hours of the day.

Uber is a tech company that helps in connecting the digital and physical worlds to help movement happen. It is a public enterprise based in San - Francisco.

Lyft:It is a ridesharing application that allows the public to share their rides with others. It is a public enterprise based in San Francisco.

Demand Capacity:It is when the demand is higher than the capacity. The maximum Traffic flow will be present on a road section during a given period.

Delay Travel Time:The difference between the actual time taken and the imagined time the vehicle traverses to the given location.

Cost:The additional cost which occurs due to delay.

Recurrent traffic is a type of congestion where traffic jams occur frequently.

Non-recurrent traffic is a type of congestion where traffic jams occur occasionally.

Transportation Planning:It is the process of improving the transportation system in a city by studying the present statistics.

Traffic Intensity:It is a collection of vehicles at a particular node or junction.

Traffic Expanse It is the volume of the congested roads

Travel Time Index Ratio is the ratio of travel times during the busiest hours and free flow.

Vehicles hour traveled (VHT) It is the number of vehicles traveled on a road section to check the quality of the road.

Vehicles Miles Traveled The number of vehicles traveling in a particular region.

Problem Statement:

Due to urbanization, Industrialization,

traffic congestion has become a severe problem for commuters and pedestrians. One of the prime causes of traffic congestion is the desire of people to travel in their vehicles. (Das & Nayyar 2019). This attribute of convenience and affinity towards vehicle ownership triggers the development and environment of those cities by increasing travel time. Additionally, it irritates the driver and becomes one of the prime reasons for accidents. Concentrating on the traffic flow and movement gives slight chances to alter the flow. Improving the road network, which is one of the costliest approaches, or through traditional traffic techniques, which include trend analysis or traffic forecast, can be utilized to improve the road network which is cheaper (Kolak & Wach, 2018; Das & Nayyar, 2019; Nagy & Simon, 2021).

Another scholar suggested that providing commuters and pedestrians the traffic flow data like traffic speed, volume, and LOS (level of service) is more beneficial than the conventional trend analysis, which only identifies recurring jams. When available to the public, the traffic congestion parameter helps them choose the optimal path by cutting excess time.

This paper will analyze New York City's current situation and will try to develop a better model.

Topic Background:

Turner et al. (2019) assert that while previously acknowledged inadequacies in traffic congestion assessment impeded our comprehension of the factors contributing to congestion and restricted the execution of operational intercessions and policy reaction, data on traffic is now plentiful regular. The authors argue that it is due, in part, to big data created by naturally streaming data provided by mobile communication services (Turner et al., 2019; Ranjan et al., 2020). For example, hundreds of cities worldwide may now be observed using similar approaches for monitoring congestion. The Taxi and Limousine Commission (TLC) provides traffic data about yellow cab trips through the CITI BIKE website in New York City.

The Texas Transportation Institute (TTI) publishes the Metropolitan Mobility Reports (UMR) every year to monitor traffic levels in more than 300 municipalities and metropolitan areas across the country. TTI maintains that traffic congestion is a significant and developing issue in American metropolises. TTI approximated the traffic delays expenses for an individual traveler in 2018 to approximately $1,080, twice what it was in 1982 when inflation was considered (Lomax et al., 2021). Each commuter lost almost four times as much time due to traffic, up to 54 hours per year, and wasted more than three times as much gasoline, up to 3.3 billion gallons per year (Lomax et al., 2021). Transportation experts have confirmed that traffic congestion is a severe issue in major cities, where traffic conditions are worsening for residents and commuters (Turner et al., 2019).

Thus, traffic congestion has a lot of harmful effects on individuals and the ecology, including long trip times and high travel costs for passengers and shipping firms, high energy consumption and carbon discharges, atmospheric and sound pollution, high passenger strain, inactivity, and an adverse effect on economic advancement (Kolak & Wach, 2018; Xiong et al., 2018). Congestion also affects how much it costs to hire people, how long they work, and how long it takes to deliver things (Nagy & Simon, 2021). Most importantly, traffic jams significantly affect people's quality of life.

The leading causes of congestion are more people wanting to travel and more people owning cars. These are a cause of rapid Industrialization, urbanization, and population upsurge. The U.S. Federal Highway Administration (FHWA) found that there are seven main reasons for traffic jams: not enough highway space, poorly sustained traffic control infrastructure, changing demand throughout the day, special events (like games), road accidents, construction, and unfavorable weather conditions (FHWA, 2022). FHWA argues that our comprehension of traffic congestion needs more specific and that more research is needed to find the right policies and corrective steps to deal with the problem. In reality, people have disagreed about how big the problem is.

Jayasooriya and Bandara (2017) say that congestion has a "small overall cost, adding at most 2% to overall travel time and gas expenses." Lu et al. (2021) assert that the actual traffic expenses are less about saving interval and gas and more about making it harder to get to the places most important to us (work, play, love). This is linked to a lower level of income and wealth. Kolak and Wach (2018) assert that cities with strong economies have more traffic problems than cities with weak economies. Marshall and Dumbaugh (2018) also conclude that congestion and productivity are linked well. They say that getting rid of or reducing congestion is a waste of time because it does not hurt cities. Chang (2017) asserts, "Traffic jams are not a problem at their core." It solves our main problem with getting around."

Figure 1 below presents a conceptual framework of the cause-effect correlation assessment of traffic congestion. It clearly shows the links between factors contributing to traffic congestion in major cities.

(Source: Rahman et al., 2022).

Traffic Congestion:

In highly populated cities like New York, Traffic congestion has become a significant problem where the TLC governs the traffic situation and tries to improve it by reducing the number of taxis and their operations.

By reviewing the statistics, we can observe that the current traffic congestion has worsened. For instance, the average speed during peak hours was 4.7mph in 2016, whereas in 2015, it was 5.6mph. This slow-moving traffic [ speed can be considered one of the parameters to increase the number of hiring automobiles on the road. Additionally, we can say that the number of vehicles licensed in 2017 was 100,000, including 69000 yellow taxis and 31000 black automobiles, an average rise of 20,000 in two years. With this, the wait time for yellow cabs has increased from 5.55 minutes in 2015 to 7.3 minutes.

Due to the traffic congestion, the wait times for taxis and their average speeds have affected their business. As a result, the salaries of taxi drivers decreased from $35000 in 2015 to $29000 in 2016 due to traffic issues. Additionally, due to the rise in demand for Uber and Lyft, which offer comfort and flexibility, Taxis have vanished or reduced in significant parts of the city.

Due to the rise in alternative modes of travel, taxi companies suffer the most. With this effect, fewer people are willing to ride and be taxi drivers.

From 2011, the TLC has taken off taxi and limousine trips by more than 20 %; this was achieved because of establishing the taxi and Limousine Innovation Lab, which helps them by providing them with new technology to improve the taxi and limousine functions. This innovation also helped them develop a new idea called "Hail a Cab," which helps the public book a taxi or limousine using their mobiles. In addition, the maximum cruising speed of taxis and limos went down from 12 mph to 10mph because of TLC.

Scope of the Paper:

The paper will focus on identifying a better-planned model for New York City by conducting advanced research through search engines and Python Software for statistical analysis and visualization. We will use Yellow Cabs data from Taxi and Limousine Commission for the statistical analysis to determine the problems in the planning model in the city of New York.

Method:

We turned to search engines and databases, including Google Scholar, JSTOR, Elsevier, and Research Gate, to find pertinent information. Additionally, different eras were investigated to represent recent efforts and early seminal studies adequately. This analysis does not, however, claim to be exhaustive. It only focuses on the effectiveness of adaptive intelligence technologies in managing traffic congestion. Nevertheless, more than 35 helpful research articles were found in the search results. The sections that follow cover these.

A Review:

8.1 A Measure of Traffic Congestion

When too many cars are on the road due to inadequate road infrastructure, excessive demand for travel, or an inefficient flow of transportation modes, it causes traffic congestion (Wan et al., 2017). The definition ofcongestionis a function of three elements, according to Wang et al. (2018) demand capacity, delay-travel time, and cost. Demand is referred to as demand capacity when demand is more significant than capacity. When the journey interval increases and there is a delay, the time taken to reach the destination increases. The time taken, including the delay time, is delay travel time. Finally, the cost is the added expense of traveling. In the authors' opinion, none of these definitions adequately conveys the situation of congestion; instead, they should be used in conjunction with one another to analyze its causes and effects.

Scholars divide traffic congestion into two categories recurrent and non-recurrent respectively. Due to limited road capacity, insufficient traffic control devices, or excessive travel, recurrent traffic congestion occurs regularly. Contrarily, non-recurring traffic congestion occurs when needed due to unplanned circumstances, such as accidents, extraordinary events, and stormy weather (Bharadwaj et al., 2017; Fiedler, 2017; Di et al., 2019; Yang et al., 2017). Non-recurring congestion happens less frequently and is typically out of the control of the organizations in charge of transportation planning. Contrarily, daily congestion causes people to experience extreme frustration by interfering with their routine activities (Chang et al., 2017). The planning horizon involves recurring congestion, which might be resolved with sufficient funding and political will if addressed correctly.

Prior studies used a range of metrics to measure the breadth and depth of congestion, focusing either on specific links (highway sections) and nodes (joints) or the overall transportation system (Bharadwaj et al., 2017; Chang et al., 2017; Jain et al. 2017 Wan et al. 2017). However, congestion measurement at the connection or node level must be a part of the micro-level analysis required for localized traffic improvements.

Network-wide data are required to approximate traffic congestion at the macro- or meso-level. Different approaches for quantifying congestion in the U.A.s take into account each of the three characteristics of congestion: intensity (degree of severity), expanse (volume of the congested highways), or period (sum of congested hours) (Rahman et al., 2022). The most popular approaches (Rahman et al., 2022; Chang et al., 2017; Jain et al., 2017) are

Travel Time Index (TTIR)

Percent of Congested Lane Miles

Peak Traffic Period Duration

Vehicle Hours Traveled (VHT)

Average Traffic Speed

Hours of Delay

Vehicle Miles Traveled (VMT)

Cost of Congestion

9. Literature Review Of Prediction Models:

9.1 Basic Terminology of Deterministic Model:

In deterministic models, a certain amount of absoluteness or certainty is involved. We can find the result with absolute confidence in deterministic models, provided we have all the required data.

Pros

Financial experts use deterministic models to determine the rise or fall of a stock because there is certainty involved here.

Deterministic models are comparatively less complex than stochastic models.

Xu et al. (2017) researched whether the Autoregressive Integrated Moving Average (ARIMA) and Kalman filters can be applied to enhance the correctness of real-time traffic condition forecasts. The authors utilized data from the Beijing Traffic Control Center to examine the accuracy of the forecasts that were produced using the two different approaches. The researchers believe that accurate real-time information can be provided to drivers and passengers via traffic prediction systems. Additionally, it helps them choose the most efficient path to take to reduce the amount of time spent traveling. An ARIMA model of road traffic statistics in a time series was created by Xu et al. (2017) using historical data on road traffic as their primary data source. They combined it with the Kalman filter to develop an algorithm for forecasting road traffic state. This method can get the state, measurement, and upgrading equations associated with the Kalman filter. The research presents a ground-breaking approach to forecasting the current state of traffic on the roads in real time.

Furthermore, the accuracy of forecasts may be improved with the Kalman filter, which analyzes data in a manner that considers the inherent uncertainty. The fact that the method suggested by Xu et al. (2017) has been validated using data collected from actual world traffic makes it an excellent candidate for addressing the challenges of traffic congestion other well-established cities face. According to the data, compared to the conventional ARIMA model, the recommended technique has the potential to improve forecast accuracy by up to 25 percent.

In another study, Emami et al. (2019) gathered data from interconnected vehicles and molded a Kalman filter to predict traffic movement along city arterials. The proposed method optimizes the use of computer resources. It delivers a real-time forecast by accessing data from connected cars just before the time when the forecast is to be created. In addition, the model approximates the movement of traffic set up on a range of linked car adoption rates (the proportion of the number of linked cars to the entire fleet). The approach is tested using a range of statistical indicators to establish how effectively the algorithm works in various traffic situations and penetration rates (Emami et al., 2019). The research offers a cheap method for predicting short-term traffic movement using data from linked automobiles.

Figure 2 shows a two consecutive junction layout.

(Source: Emami et al., 2019).

Liu et al. (2020) conducted a study to build a traffic congestion and time estimate model founded on survival and regression analysis. Liu et al. (2020) aimed to develop a model to identify the elements that lead to traffic congestion and predict how long the current traffic jam will last at any particular moment. The scholars constructed two models to better forecast the time spent on traffic congestion. Based on related data, the present scenario with traffic congestion is growing more severe; thus, figuring out how to precisely estimate the moment when traffic congestion will occur is a chief concern (Ranjan et al., 2020). In this view, the authors apply a multivariate linear regression approach to develop the traffic estimate congestion method for the present statistics. The actual traffic congestion condition is produced as a result. During the construction of the duration framework of traffic congestion, the non-parametric Kaplan-Meier model encompassed in the survival analysis technique is utilized to derive the survival factor of traffic congestion length. Liu et al. (2020) discovered that the fitting degree between the estimated and actual value of the model is more significant than 0.96, allowing for a more precise quantification of the deduction that the road traffic operation congestion degree and duration model is capable of identifying the features of traffic distribution and length.

Zhao & Hu (2019) apply survival analysis to predict how long the traffic jam would last. Survival review is a subfield of statistical analysis that attempts to estimate the time that will pass before a specific event occurs (Phanhong et al., 2020). The aim was to comprehend the phenomenon of traffic congestion, from the beginning of the congestion until its conclusion. The authors estimate the survival function by using the Kaplan-Meier estimator. The survival function is the chance that congestion will continue for a certain period (Lyu & Lin, 2022). The survival analysis findings indicate a strong association between the length of time spent in traffic congestion and the following variables: daytime, weekday, and weather conditions (Zhao & Hu, 2019). However, it is essential to remind ourselves that the model has several restrictions that must be considered. First, the model is constructed using data from a particular city. Thus, it may not apply to other cities since their environments are unique. Second, the model takes into account a restricted number of elements that are relevant to the problem of traffic congestion. Third, other variables, such as the number of lanes on a road or the speed limit, may also influence congestion; however, these aspects are not considered by the model. Besides, the model does not consider the impact of mitigating factors, such as road charging, on the extent of traffic congestion.

9.2 Basic Terminology Of Statistic Models:

We reach the results using the data and a probabilistic approach in these models. Stochastic models possess some amount of randomness. The certainty is in terms of probableness.

Pros

Stochastic models do not zero in on one particular result but most likely give a range. This gives us a better estimate of our test results. The accuracy is higher, but the model identifies a range.

We often need complete data to make a deterministic decision, so stochastic models are more widely used since data is scarce in many domains.

Courtesy WallStreetMojo.com

The Hidden Markov Model is an effective modeling method for sequential data. Traffic management and route planning may benefit from the capacity to precisely estimate the most probable path a car will take (Ye et al., 2015). The results of the different research articles indicate that the HMM-based technique is a potential tool for predicting driving routes.

Founded on Hidden Markov Models and Contrast measure, Zaki et al. (2020) propose a strategy for anticipating traffic congestion during rush hours in a 2D space. The approach predicts traffic congestion based on the concept that the contrast between two photographs may be utilized. The authors train a Hidden Markov Model using a Google Street View traffic photos dataset. The algorithm is then applied to fresh photos to anticipate traffic congestion. The authors examine the model's performance on a test set of traffic photos and determine that it achieves an 80.6% prediction accuracy. Finally, Zaki et al. (2020) examine the suggested technology's possible uses and propose that it might be used to enhance traffic regulation and control.

Ye et al. (2015) describe a technique for driving route projection founded on Hidden Markov Model (HMM). The authors maintain that the approach may correctly anticipate a vehicle's whole path as early as feasible in a trip's duration without inputting origins and destinations. They present the design of a route recommendation system in which route predictions play a significant role and then establish a road network framework. Later normalize each driving route in a rectangular coordinate system, and construct the HMM to prepare for route predictions by extending the training set using K-means++ and the add-one (Laplace) smoothing methodology (Ye et al., 2015). The authors explain how the HMM may be used to forecast the most probable path a motorist will take, given their starting point and final destination. Using real-world data from the Beijing Urban Traffic Control Center, the authors assess the certainty of the HMM-based technique. Regarding accuracy, the HMM-based technique surpasses other approaches, such as the shortest route and k-nearest neighbors.

Using a Conditionally Gaussian Observed Markov Fuzzy Switching Model, Bouyahia et al. (2022) propose a mechanism for forecasting traffic conditions. The model assumes that a Markov process regulates traffic states and that traffic state measurements are susceptible to Gaussian noise. The suggested model can capture traffic data's non-stationary and nonlinear characteristics and may be utilized to forecast future traffic conditions (Bouyahia et al., 2022). The model is verified using real-world traffic statistics, and the findings demonstrate that the proposed model outperforms current approaches for predicting traffic status. The suggested technique uses a triad model of traffic consisting of traffic flow, speed, and a switch process to infer traffic dynamics-governing parameters (Bouyahia et al., 2022). The authors analyze the effect of explicitly including blurry switching mechanisms on the predictability of traffic statistics. It tries to solve the deficiencies of numerous current prediction techniques, whereby the precise modeling of traffic dynamics hinders prediction accuracy. Experimentation demonstrates that the suggested method gives good results over a projection horizon of about 60 minutes using statistics acquired at periodic intervals of 15 minutes (Bouyahia et al., 2022).

Pavlyuk's (2017) study adds to the characterization of spatial dependence regimes in urban traffic flows. Pavlyuk states that in both traffic flow theory and empirical investigations, the significance of traffic flow jurisdiction for prognosis and the existence of spatial linkages between nodes of a road network is well recognized. In his research, the author combines these notions and takes the initial steps toward analyzing the various spatial dependence regimes in a traffic flow. The author chose the Modern Markov-switching autoregressive distributed lag models are used to analyze the model structure in various traffic flow regimes (Pavlyuk, 2017). Based on the models, he concluded the significance of traffic flow regimes in identifying spatial dependencies. The approach is exemplified since it uses actual traffic flow data. The research provided a straightforward traffic flow prognosis technique based on regime-switching models. The authors began by reviewing the current literature on traffic flow forecasting, concentrating on methods based on artificial neural networks, support vector machines, and fuzzy logic (Pavlyuk, 2017). The authors then discussed the regime-switching model and its application to forecasting. The authors evaluated the approach on a real-world data set and found that it surpassed other methods in terms of precision and computing efficiency. The article offers a clear and short summary of the strategy and its benefits. The authors comprehensively explain the regime-switching model and its application to traffic flow forecasting. The experimental findings are positive, suggesting that the approach may be valuable for anticipating short-term traffic flow (Pavlyuk, 2017).

Another study by Sun et al. (2018) researched the generic nonlinear car-following model with several time delays by considering both nonlinear and linear assessments of traffic wave stability. For the generic nonlinear car-following model, platoon balance and string stability requirements are derived. Using the reductive perturbation approach, the Burgers equation and the Korteweg de Vries (KdV) equation and associated solitary wave solutions are determined. The authors investigate the analytic and numerical features of a standard optimum velocity model, which assesses the effect of obstructions on the development of traffic congestion. Sun et al. (2018) numerical findings indicate that relative movement sensing delays are more sensitive to traffic flow stability than host motion sensing delays.

9.3. Machine Learning & Decision-Making Models

Kaplan Meier Curve

It is a kind of Survival Analysis Model.

It is used to represent the survival rate or survival function graphically.

Its presence can be found in Molecular Biology, where we intend to determine the time taken for the disease to defeat the immune system.

We compare two sets of groups for time and contrast the survivability of both curves.

We calculate the Hazard Ratios H.R. value.

Here we identify the gaps in horizontal and vertical directions and the fitted curves' orientation.

Hidden Markov model

This model is named after Russian scientist Andrey Andreyevich Markov.

When there is a sequence of observations involved, then

Hidden Markov and Markov models come into play.

It is used for Discrete data.

It has two probabilities in it

1) Transition Probability

2) Emission probability

Courtesy Researchgate.net

Kalman filter

Rudolf E kalman founded this model.It is an estimation algorithm.

This model was first used in the APOLLO project.

The Response Variable is Continuous Data Type.

Kalman Filter is used in multiple areas, such as

1. Navigation systems

2. Computer Vision Systems

3. Signal Processing

The variable is determined indirectly.

Decision Tree Modelling

A Decision Tree corresponds to a flowchart algorithm that begins with an idea and branches further down.

A decision tree is an upside-down tree.

The decision Tree can be divided into further components,

1. Root node- The beginning of the decision Tree

2. Decision node

2.1 This is Where the decision will be made

2.2 Nodes that we get after splitting the root nodes

3. Leaf node- Node where further splitting is not possible.

4. Sub tree - Subsection of decision tree

Pruning - Cut down some nodes to stop the overfitting

Entropy

1. Uncertainty in the data set is called Entropy.

2. Decision-making becomes difficult in case of high Entropy.

3. Higher the Entropy higher the disorder Ness in the Sample Set

10.Related Case Studies:

By separating the route into autonomous fragments and predicting the origin-destination-based traffic flow behavior of the fragments, Jain et al. (2017) were able to anticipate congestion on the urban highway and sub-arterial roads in a city with a diverse traffic set-up. The predicted trip time for a journey is modeled based on partial traffic attributes (flow and speed) at the origin and destination of road fragments, and roadway and segment traffic factors, including divergence routes, are also explored to account for travel time (Jain et al., 2017). The authors limited observation locations along the route to a few selected nodes rather than the whole distance. They developed a model to forecast travel time on a given fragment using multiple linear regression. Congestion Index, a competent, route-length-unconventional measure of congestion, was established by combining information regarding free flow time for that fragment, which was approximated using, among other techniques, nocturnal field observation data (Jain et al., 2017). The method was used for a significant substantial route in Delhi. Special events were analyzed to produce four effective models, one of which was selected based on the best statistical numbers (adjusted R2 = 0.917). It was tested utilizing root mean squared error (RMSE) assessment, giving remarkable reliability of 7.2% (Jain et al., 2017). The method is crucial and has some strengths, including accounting for vehicle origins and destinations. It is significant since the sources and destinations of cars are the primary elements that influence the amount of traffic congestion. The model can also precisely estimate traffic congestion, demonstrating its viability (Jain et al., 2017). However, the approach has limits, including only considering vehicle origins and destinations. It cannot account for other variables that may influence the amount of traffic congestion. For example, the model cannot account for weather or time of day.

Another study by Garca-Ramrez (2020) aimed to create a traffic congestion model for the city of Quito, Ecuador, using Google traffic data. To create the model, the author utilized Google Maps data for a year (January to December 2016). To quantify traffic congestion, they employed four parameters: journey duration, traffic intensity, space mean speed, and Density (Garca-Ramrez, 2020). According to the author, numerous variables impact traffic congestion in Quito, including the daytime, holidays, and weather. The scholar concluded that their approach might be utilized to forecast traffic congestion in other global locations (Garca-Ramrez, 2020).

Moreover, one of the study's strengths is its utilization of real-world data. The research was able to construct a model based on natural traffic conditions by using Google Maps data. Another aspect of the research is its emphasis on a single city. The researchers constructed a model specially customized to the traffic circumstances in Quito by concentrating on the city. Nonetheless, the research shows weaknesses because it depends on Google Maps data. While this data is typically correct, inaccuracies are always possible. Another disadvantage of the research is its emphasis on a particular city. While the study's findings are relevant to Quito, they may only apply to some Ecuadorian cities.

11. Conclusions:

We reviewed around 38 research journals and papers in our literature review. Different authors have diverse opinions about traffic congestion in different metro polytan cities. Moreover, different authors of these papers have suggested different Models for alternative methods for decongestion.

We would perform the Statistical Analysis of data on our data set similar to that we have observed for different models used by the authors in these papers. Our analysis will use the decision Tree Model, Random Forest Model Linear Regression methods, Hidden Markov Model, and Kaplan Filter. We will choose the model that gives the best results.

The models used to analyze and comment on Traffic Congestion include the statistical and deterministic methods for studying a city, and these have indeed helped towards decongestion.

Additionally, the results showed that congestion impacts labor expenses, working hours, and the time is taken for items to be delivered. Generally, traffic congestion tremendously influences an individual's life quality. Therefore, based on the outcome of the literature assessment, it is recommended that further research be conducted to correctly identify policies and corrective measures to address the congestion issue.

Based on the findings of the various research articles reviewed, using deterministic and statistical models can enhance the accuracy of traffic predictions. However, there are some limitations to these models that must be considered:

The models are often developed for specific cities and may only apply to some.

The models often consider a limited number of variables, such as daytime, weekday, or weather conditions. Other variables, such as the number of lanes on the roadway or the speed limit, may influence congestion but are only sometimes considered by the models.

The models only sometimes consider the impact that mitigating factors, such as road changing, have on the level of traffic congestion.

Despite these limitations, using these models can still provide valuable insights into the problem of traffic congestion. In addition, it may help develop strategies for managing and alleviating congestion.

12. References:

Bharadwaj, S., Ballare, S., & Chandel, M. K. (2017). Impact of congestion on greenhouse gas emissions for road transport in Mumbai metropolitan region. Transportation Research Procedia, 25, 35383551.https://doi.org/10.1016/j.trpro.2017.05.282Bouyahia, Z., Haddad, H., Derrode, S., & Pieczynski, W. (2022). Traffic state prediction using conditionally Gaussian observed Markov fuzzy switching model. Journal of Intelligent Transportation Systems, 1-20.

Chang, Y. S., Lee, Y. J., & Choi, S. S. B. (2017). Is there more traffic congestion in larger cities?-Scaling analysis of the 101 largest U.S. urban centers? Transport Policy, pp. 59, 5463.

Das, S., & Nayyar, A. (2019). Innovative ideas to manage urban traffic congestion in cognitive cities. In Driving the development, management, and sustainability of cognitive cities (pp. 139-162). IGI Global.

Di, X., Xiao, Y., Zhu, C., Deng, Y., Zhao, Q., & Rao, W. (2019, June). Traffic congestion prediction by spatiotemporal propagation patterns. In 2019 20th IEEE International Conference on Mobile Management (MDM) (pp Data. 298303). IEEE.

Emami, A., Sarvi, M., & Asadi Bagloee, S. (2019). Using Kalman filter algorithm for short-term traffic flow prediction in a connected vehicle environment. Journal of Modern Transportation, 27(3), 222232.

Ewing, R., Tian, G., & Lyons, T. (2018). Does compact development increase or reduce traffic congestion? Cities, pp. 72, 94101.

Faghih-Imani, A., Anowar, S., Miller, E. J., & Eluru, N. (2017). Hail a cab or ride a bike? A travel time comparison of taxi and bicycle-sharing systems in New York City. Transportation Research Part A: Policy and Practice, 101, 11-21.

Fiedler, D., p, M., & ertick, M. (2017, October). Impact of mobility-on-demand on traffic congestion: Simulation-based study. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-6). IEEE.

FHWA. (2022). Sources for FHWA Reports | FHWA. Highways.dot.gov.https://highways.dot.gov/research/resources/research-library/sources-fhwa-reportsGarca-Ramrez, Y. (2020). Developing a Traffic Congestion Model based on Google Traffic Data: A Case Study in Ecuador. In VEHITS (pp. 137-144).

Jain, S., Jain, S. S., & Jain, G. (2017). Traffic congestion modeling based on origin and destination. Procedia Engineering, 187, 442450.https://doi.org/10.1016/j.proeng.2017.04.398Jayasooriya, S. A. C. S., & Bandara, Y. M. M. S. (2017, May). Measuring the Economic costs of traffic congestion. In 2017 Moratuwa Engineering Research Conference (Mercon) (pp. 141-146). IEEE.

Kohan, M., & Ale, J. M. (2020). Discovering traffic congestion through traffic flow patterns generated by moving object trajectories. Computers, Environment and Urban Systems, 80, 101426.

Kolak, A., & Wach, D. (2018). Causes of traffic congestion in urban areas. Case of Poland. In SHS Web of Conferences (Vol. 57, p. 01019). EDP Sciences.

Liu, Y.D., Liu, C. and Zheng, Z.L. (2020) Traffic Congestion and Duration Prediction Model Based on Regression Analysis and Survival Analysis. Open Journal of Business and Management, 8, 943-959. https://doi.org/10.4236/ojbm.2020.82059

Lomax, T., Schrank, D., Eisele, B., & Albert, L. (2021). Urban Mobility Report Urban Mobility Information. Mobility.tamu.edu. https://mobility.tamu.edu/umr/

Lu, J., Li, B., Li, H., & Al-Barakani, A. (2021). Expansion of city scale, traffic modes, traffic congestion, and air pollution. Cities, 108, 102974.

Lyu, D., & Lin, Y. (2022). Impact Estimation of Traffic Accident Duration Based on Survival Analysis by Using Field Urban Traffic Condition. In International Conference on Green Building, Civil Engineering and Smart City (pp. 1247-1254). Springer, Singapore.

Marshall, W. E., & Dumbaugh, E. (2018). Revisiting the relationship between traffic congestion and the economy: A longitudinal examination of U.S. metropolitan areas. Transportation, pp. 140.https://doi.org/10.1007/s11116-018-9884-5Metz, D. (2018). Tackling urban traffic congestion: The experience of London, Stockholm, and Singapore. Case Studies on Transport Policy, 6(4), 494498.

Nagy, A. M., & Simon, V. (2021). Traffic congestion propagation identification method in smart cities. Infocommunications Journal, 13(1), 45-57.

Nbcnewyork.com. (2021, December 7). NYC Ranks as the City With Worst Traffic Congestion in the U.S., Study Finds. NBC New York. https://www.nbcnewyork.com/news/local/nyc-ranks-as-the-city-with-worst-traffic-congestion-in-the-u-s-study-finds/3438472/

Nugmanova, A., Arndt, W. H., Hossain, M. A., & Kim, J. R. (2019). Effectiveness of ring roads in reducing traffic congestion in cities for the long run: big Almaty ring road case study. Sustainability, 11(18), 4973.

Phanhong, M., Likitpanjamanon, P., Chareonwai, P., Srisurapanon, V., & Phunchongharn, P. (2020, September). A spot-recommendation system for taxi drivers using monte Carlo optimization. In 2020 1st International Conference on Big Data Analytics and Practices (IBDAP) (pp. 1-5). IEEE.

Pavlyuk, D. (2017). On application of regime-switching models for short-term traffic flow forecasting. In Advances in dependability engineering of complex systems (pp. 340349). Springer, Cham.

Rahman, M. M., Najaf, P., Fields, M. G., & Thill, J. C. (2022). Traffic congestion and its urban scale factors: Empirical evidence from American urban areas. International Journal of Sustainable Transportation, 16(5), 406421.

Ranjan, N., Bhandari, S., Zhao, H. P., Kim, H., & Khan, P. (2020). City-wide traffic congestion prediction based on CNN, LSTM, and transpose CNN. IEEE Access, 8, 81606-81620.

Rocha Filho, G. P., Meneguette, R. I., Neto, J. R. T., Valejo, A., Weigang, L., Ueyama, J., ... & Villas, L. A. (2020). Enhancing intelligence in traffic management systems to aid in vehicle traffic congestion problems in smart cities. Ad Hoc Networks, 107, 102265.

Sun, D., Chen, D., Zhao, M., Liu, W., & Zheng, L. (2018). Linear stability and nonlinear analyses of traffic waves for the general nonlinear car-following model with multi-time delays. Physica A: Statistical Mechanics and its Applications, 501, 293-307.

Turner, S. M., Benz, R. J., Hudson, J. G., Griffin, G. P., Lasley, P., Dadashova, B., & Das, S. (2019). Improving the amount and availability of pedestrian and bicyclist count data in Texas (No. FHWA/TX-19/0-6927-R1). Texas A&M Transportation Institute.

Wan, J., Yuan, Y., & Wang, Q. (2017, March). Traffic congestion analysis: A new Perspective. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1398-1402). IEEE.

Wang, Q., Wan, J., & Yuan, Y. (2018). Locality constraint distance metric learning for traffic congestion detection. Pattern Recognition, 75, 272-281.

Yang, Y., Xu, Y., Han, J., Wang, E., Chen, W., & Yue, L. (2017). Efficient traffic congestion estimation using multiple Spatio-temporal properties. Neurocomputing, 267, 344-353.

Ye, N., Wang, Z. Q., Malekian, R., Lin, Q., & Wang, R. C. (2015). A method for driving route predictions based on a hidden Markov model. Mathematical Problems in Engineering, 2015.

Xiong, H., Vahedian, A., Zhou, X., Li, Y., & Luo, J. (2018, November). Predicting traffic congestion propagation patterns: A propagation graph approach. In Proceedings of the 11th ACM SIGSPATIAL International Workshop on Computational Transportation Science (pp. 6069).

Xu, D. W., Wang, Y. D., Jia, L. M., Qin, Y., & Dong, H. H. (2017). Real-time road traffic state prediction based on ARIMA and Kalman filter. Frontiers of Information Technology & Electronic Engineering, 18(2), 287-302.

Zaki, J. F., Ali-Eldin, A., Hussein, S. E., Saraya, S. F., & Agreed, F. F. (2020). Traffic congestion prediction based on Hidden Markov Models and contrast measures. Ain Shams Engineering Journal, 11(3), 535551.

Zhao, P., & Hu, H. (2019). Geographical patterns of traffic congestion in growing megacities: Big data analytics from Beijing. Cities, pp. 92, 164174.

ABSTRACT

PROBLEM STATEMENT:

Cities are getting choked because of excessive traffic. We will access the current situation for New York City and estimate a better planned model based on statistical analysis of existing data.

DATA:

The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). We will collect the data for the months of January to June of 2022 and use it for our analysis. We also have collected e-scooter data from CITI-BIKE TRIP data which will used for our study to improve its usage in purview of accessibility and availability.

NYC Taxi Trips and CITI BIKE data links

Source: TLC Trip Record Data - TLC (nyc.gov).

Source: https://www.citibikenyc.com/system-data

METHODOLOGY:

We deeply analyse the data and based on the findings in Exploratory Data Analysis.

Predicting the duration of a ride of NYC Yellow Taxi based on time and distance. e- scooter data would be analysed to make a case that if the e-scooter use increases it would result in the reduction of traffic snarls in the city.

Concatenation of e-scooter and NYC Yellow Data.

ANTICIPATED RESULT:

Difference in time taken for a ride in the rush hours of the working days to the other hours in the day will be statistically established. This time will reduce for the working hours travel time if the planning model is improved along with the e-scooter use in the city.

Download Solution Now

Uploaded By : Pooja Dhaka
Posted on : December 22nd, 2024
Downloads : 0
Views : 225

Difference in time taken for a ride in the rush hours of the working days to the other hours in the day will be statistically established their resu

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Difference in time taken for a ride in the rush hours of the working days to the other hours in the day will be statistically established their resu

Download Solution Now

Download Solution Now

Choose a Plan

Premium

Gold

Silver

Request a Call Back