FORECASTING THE NUMBER OF ROAD ACCIDENTS IN POLAND USING WEATHER-DEPENDENT TREND MODELS

Every year a very large number of people die on the roads. From year to year, the value decreases, there are still a very high number of them. The pandemic has reduced the number of road accidents, but the value is still very high. For this reason, it is necessary to know under which weather conditions the highest number of road accidents occur, and to know the forecast of accidents according to the prevailing weather conditions for the coming years, in order to be able to do everything possible to minimize the number of road accidents. 
The purpose of the article is to make a forecast of the number of road accidents in Poland depending on the prevailing weather conditions. The research was divided into two parts. The first was the analysis of annual data from the Police statistics on the number of road accidents in Poland in 2001-2021, and on this basis the forecast of the number of road accidents for 2022-2031 was determined. The second part of the research, dealt with monthly data from 2007-2021. Again, the analyzed forecast for the period January 2022-December 2023 was determined. 
The results of the study indicate that we can still expect a decline in the number of accidents in the coming years, which is particularly evident when analyzing annual data. It is worth noting that the prevailing pandemic distorts the results obtained. The research was conducted in MS Excel, using selected trend models.


Introduction
Road traffic accidents are events that cause not only injuries or death to road users, but also damage to property. According to the WHO, approximately 1.3 million people die each year as a result of traffic accidents. Traffic accidents account for around 3% of their GDP for most of the countries in the world. Road traffic accidents are the leading cause of death for minors and young people aged 5-29 (The Global Status on Road Safety 2018). The UN General Assembly has set an ambitious goal of halving the number of road deaths and injuries by 2030.
The extent of a traffic accident is an attribute for determining its severity. Predicting the severity of accidents is important for competent authorities when designing transport safety policies to eliminate accidents, reduce injuries, deaths and property losses (TambouraTzis et al. 2014(TambouraTzis et al. , zhu et al. 2019. The identification of critical factors that affect the severity of accidents is a precondition for taking countermeasures to eliminate and mitigate the severity of accidents (arTeaga et al. 2020). Yang et al. proposes a DNN (Deep Neutral Network) multi-carbon framework to predict different levels of severity of injury, death and property loss. It allows a comprehensive and accurate analysis of the severity of traffic accidents (Yang et al. 2022).
There are several sources of accident data. They are mostly collected and analyzed by government authorities through the relevant government agencies. Data collection is carried out through police reports, insurance databases or hospital records. Partial traffic accident information is subsequently processed for the transport sector on a larger scale (Gorzelańczyk et al. 2020).
Intelligent transportation systems are currently the most important source of data related to the analysis and prediction of traffic accidents. The data can be processed due to the use of GPS devices in vehicles (chen 2017). Microwave vehicle detection systems at roadsides can continuously record vehicle data (speed, traffic volume, vehicle type, etc.) (khaliq et al. 2019). The Vehicle License Plate Recognition system also makes it possible to collect large amounts of traffic data over a monitored period (rajpuT et al. 2015). Another source of data for obtaining traffic and accidents information can be social media, but their relevance may be insufficient due to the incompetence of reporters (zheng et al. 2018).
For the relevance of accident data, it is necessary to work with several data sources that need to be confronted correctly. The combination of different data sources by consolidating heterogeneous traffic accident data helps to increase the accuracy of the analysis results (abdullah, emam 2016).
A statistical survey aimed at assessing the severity, finding out the connection between traffic accidents and road users was performed by Vilaca et al. (2017). The result of the study is a proposal to improve road safety standards and the adoption of other policies related to transport safety.
Bak et al. (2019) conducted a statistical survey of traffic safety in a selected region of Poland based on the number of traffic accidents, the pace of finding out the causes of their occurrence. The survey applied a multidimensional statistical analysis to examine safety aspect of persons responsible for accidents.
The choice of the source of accident data for the analysis depends on the type of traffic problem being addressed. The combination of statistical models with other natural driving data or other data obtained through intelligent transport systems contributes to increasing the accuracy of accident forecasts and contributes to their elimination (chand et al. 2021).
Various methods of forecasting the number of accidents can be found in the literature. Most often, time series methods are used for forecasting the number of road traffic accidents (helgason 2016, lavrenz 2018), the disadvantages of which are the impossibility of assessing the quality of forecast on the basis of expired forecasts and the often-occurring autocorrelation of the residual component For forecasting the number of road accidents, the vector autoregression model has also been used, whose drawback is the need to have a large number of observations of the variables in order to correctly estimate their parameters (wójcik 2014), as well as the autoregression models of monederoa et al. (2021) for analysing the number of fatalities (monederoa et al. 2021) and al-madani (2018), curve-fitting regression models. These, in turn, require only simple linear relationships (mamczur 2022), and the order of the autoregression (assuming that the series are already stationary) (Piłatowska 2012).
Biswas et al. (2019) used Random Forest regression to predict the number of road accidents. In this case, the data contain groups of correlated features with similar significance to the original data, smaller groups are favoured over larger ones (Las losowy 2022), and there is instability in the method and spike prediction (Fijorek et al. 2010). chudy-laskowska and pisula (2014) used the autoregressive model with quadratic trend, the univariate periodic trend model and the exponential equalization model for the forecasting issue discussed. A moving mean model can also be used for forecasting the discussed issue, the disadvantages of which are low forecast accuracy, loss of data in the sequence, lack of consideration of trends and seasonal effects (kashPruk 2010). Prochozka and camej (2017) used the GARMA method, in which some restrictions are imposed in the parameter space to guarantee the stationarity of the process. Very often the ARMA model for a stationary process or ARIMA or SARIMA for a non-stationary process is used for forecasting (Procházka et al. 2017, sunnY et al. 2018, duTTa et al. 2020, karlaFtis et al. 2009). These models result in very high flexibility of the discussed models, but it is also their disadvantage, Piotr Gorzelańczyk as good model identification requires more experience from the researcher than, for example, regression analysis (łoBejko et al. 2015). Another disadvantage is the linear nature of the ARIMA model (dudek 2013).
chudy-laskowska and pisula (2015) in their work used the ANOVA method to forecast the number of road crashes. The disadvantage of this method is the adoption of additional assumptions, especially the assumption of sphericity, the violation of which may lead to erroneous conclusions (GreGorczyk, swarcewicz 2012). Neural network models are also used to forecast the number of road accidents. The disadvantage of ANN is the need for experience in this field (chudy-laskowska, Pisula 2017, wroBel 2017) and the dependence of the final solution on the initial conditions of the network, as well as the lack of interpretability in the traditional way since ANN is usually referred to as blackbox where you give input and the model gives output without any knowledge about the analysis (Techniki zgłębiania danych (data mining) 2022).
A new prediction method is the use of the Hadoop model by kumar et al. Analyzing the above information, trend models were chosen to predict the number of traffic accidents depending on weather conditions.

Number of road accidents
Every year a very large number of people are killed on the roads. From year to year, the value decreases, there are still a very high number of them. Pandemic has reduced the number of road accidents, but the value is still very high. Analyzing the data on the number of road accidents according to the prevailing weather conditions on an annual and monthly basis, it can be said that there are clear fluctuations with a continuing downward trend. Compared to the European Union, the number of accidents in Poland is still very high. For this reason, every effort should be made to know the forecast of the number of accidents for the coming years under different weather conditions ( Fig. 1, 2).  For the purpose of this work, it was assumed (krzyczkowska 2019): -good atmospheric conditions are: • air temperature > 3ºC, • no precipitation, • wind < 5.5 m/s, • visibility > 10 km, • pressure difference over the day < 8 hPa; -bad weather conditions (if one of the following factors is met) are: • slippery pavement (temperature < 3ºC and occurrence of precipitation), • heavy rain (temperature > 0ºC, precipitation > 3 mm), • snowstorm (temperature < 0ºC, precipitation > 3 mm), • strong wind (wind > 10 ms/s) • dense fog (visibility < 300 m).

Forecasting the number of traffic accidents
The following trend models were used in forecasting the number of traffic accidents for the analyzed weather conditions: -exponential, -linear, -logarithmic, -polynomial of 2 nd degree, -polynomial of 3 rd degree, -polynomial of 4 th degree, -polynomial of 5 th degree, -polynomial of 6 th degree, -potentiometric. In the first step, for the analyzed trend models, the mathematical formula of the analyzed data on an annual and monthly basis was determined. As can be seen, the R-square coefficient, which is a measure of the quality of the model fit for annual data in most cases there is a good or satisfactory fit, while for monthly data there is a poor and unsatisfactory fit. This is mainly due to the seasonality of the number of traffic accidents in the weather conditions analyzed, with the least number of accidents during fog and the most during good conditions (Tab. 1-7). Model parameters were determined using the least squares method.
Then, using the data in Tables 1-7, the projected number of traffic accidents was determined. For annual data it was the period 2022-2031, while for monthly data it was the period from January 2022 to December 2023. The forecast in this case was based on trend models and historical data. The result of the forecast using this method, depends on the choice of the model and its fit.  In the next step, expired forecast errors were determined for the obtained forecasts based on equations (1-5): • ME -mean error • MAE -mean absolute error Piotr Gorzelańczyk • MPE -mean percentage error • MAPE -mean absolute percentage error • MSE -mean square error where: n -number of information about road accidents, Y -observed value of road accidents, Y p -forecasted value of road accidents.
To forecast the number of traffic accidents depending on the prevailing weather conditions, trend models were selected for which the mean percentage error and mean absolute percentage error were the smallest. On this basis, it was found that for annual data in most cases the exponential model was the best fit, while for good weather conditions the linear model was the best fit, and for strong wind conditions the power model was the best fit. For all atmospheric conditions analysed, the maximum MAPE error was 12.3%. However, for monthly data for the exponential model, which also proved to be the best, except for solar glare over cloudy -the linear model, the error ranged from 79% to 28794%. This is a very large value (Tables 8, 9). On this basis, the projected number of accidents for the following years was determined on a monthly and annual basis (Tab. 10, 11, Fig. 3, 4). Based on Table 10, 11 and Figures 3 and 4, we can expect a further decrease in the number of traffic accidents in the following years. Note that the pandemic has caused significant changes in the forecasts. As can be seen in Figure 4, the trend models do not take into account the seasonality present in traffic accidents and should not be used in the case under analysis.  72,538,421,905.45 4,894,311.41 2,876,780,693.66 97,038,950.94 1,046,654,730.96 10,450,947,6326.06 8,366,291,597.63 MSE 402,991,232.81 27,190.62 15,982,114.96 539,105.28 5,814,748.51 580,608,201.81 46,479,397

Conclusions
Forecasts of the number of accidents in Poland for the analysed weather conditions were determined by selected trend models using Excel. The results show that we can still expect a decrease in the number of traffic accidents in the coming years. It should be noted that the pandemic has skewed the results obtained, and if it continues and traffic restrictions are imposed, the proposed model may not be adequate. The average value of the MAPE for the cases analyzed was 3%, for annual data, may indicate the choice of an effective forecasting method. As can be seen, trend models fail for forecasting the monthly number of traffic accidents, where there is seasonality. In contrast, for annual data, the results are at a high level. The advantage of trend models is the speed of determining the forecast.
The forecast of the number of traffic accidents obtained in the article, can be used in the future to formulate further measures to minimize the number of accidents in the analysed country. These measures may include, for example, the introduction of higher fines for traffic offenses on Polish roads from January 1, 2022.
In his further research, the author plans to take into account more factors influencing accident rates in Poland and apply other methods of forecasting the number of road accidents. We can include traffic volume, day of the week or age of the accident perpetrator, among others.