Clustering Based on Forecasting Density: Case Study of Unemployment Rate in Iran's Provinces

Authors

1 Assistant Professor, Department of Economics, Ayatollah Boroujerdi University

2 Assistant Professor, Department of Mathematics, Ayatollah Borujerdi University

Abstract

         It is important for regional planners and policymakers to be aware of the unemployment rate of the provinces in specified time horizons. In this paper, clustering of time series based on their forecasting density to a specified horizon is investigated. In this algorithm, we use the bootstrap process to approximate the distribution of predictions. The differences between each pair of bootstrap densities generate a dissimilarity matrix that is used for clustering. For this purpose, seasonal unemployment data was used in the spring of 2005 to fall of 2017, and according to the forecasting density algorithm, we will cluster the unemployment rate of Iran's provinces for two horizons of 4 steps (one year) and 10 steps (two and a half years). The best situation will be in the 4 steps or 10 steps (two and a half years), the provinces of Semnan and Zanjan, and the worst situation in the provinces of Lorestan and Kermanshah. Also, in the two horizons studied, except for some provinces, the rest were fixed in their main clusters. The spatial distribution of unemployment in Iran, based on forecasting density clustering, shows that western and southwestern provinces will have the highest unemployment rates. Therefore, the need for regional planning and serious attention to the employment of these provinces is recommended. At the same time, the provinces that are in an unfavorable situation have high unemployment neighbors, and the provinces with low unemployment rate have predominantly neighborhoods with a low unemployment rate. In other words, there is a positive spatial correlation between the neighboring provinces and the unemployment rate.
Extended Abstract
Introduction:
        Managing raw data and extracting useful information plays an important role in decision making. Clustering as one of the descriptive data mining methods is followed by organizing the data into a number of clusters in such a way that objects in the same cluster are more similar to each other than to those in other clusters. Alonso et al. (2006) proposed a concept of dissimilarity measure based on the forecast densities, for each one of the observed series in the sample for a given future horizon. They combined a smoothed sieve bootstrap procedure with nonparametric kernel density estimation ideas to approximate the distribution of the predictions. Villar et al. (2010) developed this method and also covered nonparametric nonlinear autoregressive models.
In this paper, both feature-based and model-based approaches are used to cluster time series data. The main purpose of this study is to cluster time series data based on complete prediction densities for each series in the set, rather than focusing on point predictions. Here, time series clustering is performed based on full forecast densities. Time series fall into a cluster in a forecast density distribution in the future time horizon with other similar time series in the same future time horizon. Clustering and dissimilarity based on the forecast densities can be easily interpreted. Unemployment is a good measure for the state of the balance between the key pillars of the country's economy. Therefore, it is important for the authorities to address it. In this study, in order to show the efficiency of the mentioned clustering method, the problem of unemployment rate clustering in the provinces of Iran was considered and the provinces were clustered in terms of unemployment rate in the next four and ten steps horizon.
Methodology:
          In this paper, clustering is performed based on full forecast densities instead of focusing on point predictions. Suppose XT and YT are two stationary processes with the following autoregressive representation:
 Where  is a white noise and  is a smooth function which is not constrained by any predetermined parametric model. At a specified future time T+h, let
 
where  and  denote the density function of the forecasts XT+h and YT+h, respectively.  denotes the distance between XT and YT at the specified future time T+h. The correct forecast densities are replaced by kernel (nonparametric) estimates based on bootstrap predictions.
Results and discussions:
          According to the proposed algorithm, the unemployment rate of the Iran’s provinces will be clustered for two time horizons of 4 seasons and 10 seasons. Seasonal data of provincial unemployment rate during spring 2005 to autumn 2017 were extracted from the statistics center. Some of the results are:
Clustering for 4 future seasons:

cluster 1) Zanjan, Semnan, Kerman, Mazandaran, Khorasan Razavi, North Khorasan, South Khorasan, Yazd, East Azarbaijan, Golestan, Markazi, Hormozgan, Qazvin, Qom, Sistan and Baluchestan, Bushehr and Tehran.
cluster 2) Chaharmahal & Bakhtiari, Kohkiluyeh & Boyerahmad, Kermanshah, Lorestan, Kurdistan, West Azarbaijan, Hamedan, Ilam, Gilan, Ardabil, Fars, Isfahan, Khuzestan.

Forecasting at 10 future seasons, the provinces of West Azerbaijan, Hamedan, Kurdistan, Ardebil, Khuzestan and Isfahan were moved from main cluster 2 to main cluster 1. Since the situation in the provinces of cluster 1 is better than cluster 2, it can be said that these provinces will have better situation in the next two years compared to the next one. Kermanshah and Lorestan have the worst situation in both 4 and 10 time horizons.
In the current situation, the provinces of East Azerbaijan and Yazd have the lowest average unemployment rate among the provinces of the country. However, in the above time horizons, although they are located in cluster 1 (provinces with good status), the prediction density of Zanjan and Semnan proves better. According to the results of the clustering, west and southwestern provinces will also have the highest unemployment rate.
Conclusion:
         A new density-based clustering method is represented for forecasting. The time series of unemployment rates of the provinces were clustered in two time horizons of 4 seasons and 10 seasons. The results show that Semnan and Zanjan provinces will have the best situation in 4 or 10 steps and the worst situation will be in Lorestan and Kermanshah provinces. Also in the two time horizons studied, with the exception of a few provinces, the rest were fixed in their original clusters.
The spatial distribution of unemployment in Iran based on the forecast density-based clustering shows that the western and southwestern provinces will still have the highest unemployment rate. There is also a positive spatial correlation between the neighboring provinces and the unemployment rate. The results of the research of Razvani et al (2013) indicate high unemployment rate in the south and west of the country. As can be seen from the results of the present study, one can expect the high unemployment rate to remain in the same areas with the exception of southeastern provinces such as Sistan and Baluchestan, Kerman and Hormozgan as well as Bushehr.
      The results of this study recommend attention to provincial-based economic policies to reduce employment inequalities. It is also suggested that other macroeconomic variables be clustered with this new approach to provide a clearer horizon in economic policy making. In addition, the literature results show that the unemployment has a huge impact on immigration, crime and suicide. Therefore, addressing the problem of high unemployment rates in the next 4 or 10 time horizons in the provinces involved can minimize the impact of such phenomena.

Keywords


  1. Abbasi Nejad H, Ramezani H, Sadeghi M.(2013). A Study of the Relation between Unemployment and Crime in Iran Provincial Consolidated Data Approach. 3. 20 (64) :65-86 (In Persian).
  2. Aghabozorgi, S. Shirkhorshidi, A.S. and Wah, T.Y. (2015). Time-series clustering – A decade review. Information Systems, Vol. 53, pp. 16-38.
  3. Akhbari, R., Taee, H. (2017). Identifying hysteresis effect in unemployment rate with emphasis on second generation panel unit root and PANIC method. journal of applied economics studies in iran, 6(22), 1-31. doi: 10.22084/aes.2017.11136.2209 (In Persian).
  4. Alonso, A.M. Berrendero J.R. Hernandez A. and Justel A. (2006): Time Series Clustering Based on Forecast Densities. Computational Statistics and Data Analysis, Vol. 51, pp. 762-776.
  5. Cowpertwait, P.S.P. Cox, T.F. (1992): Clustering population means under heterogeneity of variance with an application to a rainfall time series problem. The Statistician, Vol. 41, pp. 113–121.
  6. Cracolici, M.F. Cuffaro, M. and  Nijkamp,  P. (2007): A spatial analysis on Italian unemployment defferences. Statistical Methods and Applications, Vol. 18, Issue 2, pp. 275-291.
  7. Feizpour, M., Lotfi, E. (2015). Economical Distinctions and Social Problems of Iran: Rates of Unemployment and Suicide. strategic rssearch on social problems in iran university of isfahan, 4(1), 153-166 (In Persian).
  8. Filiztekin, A. (2009): Regional unemployment in Turkey, Regional Science, Vol. 88, Issue 4, pp. 863-878.
  9. Fruhwirth-Schnatter, S., Kaufmann, S. (2004): Model-based clustering of multiple time series. CEPR Discussion Paper No. 4650.
  10. Galeano, P. and Peña, D. (2000): Multivariate analysis in vector time series, Resenhas, Vol. 4, pp. 383–403.
  11. Garcilazo, J. E. (2006): Regional unemployment clusters: neighborhood and state effects in Europe and North America, The Review of Regional Studies, Vol. 37, Issue 3, pp. 282-302.
  12. Gharavi Nakhjavani, A. (2002): "The Unemployment Crisis in the Iranian Economy", Economic Quarterly, No. 6, 171-184 (In Persian).
  13. Hosseini, G., Sadeqi, R., Ghasemi, A., Rostamalizadeh, V. (2018). Trends and Patterns of Internal Migration in Iran. Journal of Regional Planning , 8(31), 1-18 (In Persian).
  14. Imani, M. (2012): Clustering of Persian texts, Master's thesis, Faculty of Computer Engineering, Sharif University of Technology. (In Persian).
  15. Iran Statistical Center, "Employment and Unemployment Indicators of the country in 1376-1391", Tehran (In Persian).
  16. Izadparast, M. (2011): Classification of Insurance Customers Using Data Mining, Payame Noor University of Tehran, New World Insurance Monthly, No. 161.(In Persian).
  17. Kakizawa, Y. Shumway, R.H. and Taniguchi, M. (1998): Discrimination and clustering for multivariate time series, Journal of the American Statistical Association, Vol. 93, pp. 328–340.
  18. Kaufman, L. and Rousseeuw, P.J. (2005): Finding groups in data: an imtroduction to cluster analysis, Hoboken, NJ: Wiley.
  19. Liao, W. T. (2005): Clustering of time series data-a survey. Pattern recognition, Vol. 38, Issue 11, pp. 1857-1874.
  20. Lopez-Bazo, E. and Motellon, E. (2011): The regional distributions of unemployment. What do micro-data tell us?, Research Institute of Applied Economics.
  21. Macchiato, M.F. La Rotonda, L. Lapenna,V. and Ragosta, M. (1995): Time modelling and spatial clustering of daily ambient temperature: an application in Southern Italy, Environmetrics, Vol. 6, pp. 31–53.
  22. Maharaj, E.A. (1996): A significance test for classifying ARMA models, Journal of Statistical Computation and Simulation, Vol. 54, pp. 305–331.
  23. Montero P. and Vilar, J.A. (2015): TSclust: An R Package for Time Series Clustering, Journal of Statistical Software, Vol. 62, Issue 1, pp. 1-43.
  24. Motii-Haghshenas, N. (2002): "A Comparative Study of the Employment and Unemployment Rate of Population in Different Provinces of Iran", First Conference of Iranian Demographic Association, Tehran, Iranian Demographic Association (In Persian).
  25. Noghani Dokht Bahmani, M., Mir Mohamad Tabar, S. (2016). Study of Economic Factors Affecting the Crime (Meta-analysis of Research Conducted in Iran). strategic rssearch on social problems in iran university of isfahan, 4(3), 85-102 (In Persian).
  26. Pattarin, F. Paterlini, S. and Minerva, T. (2004): Clustering financial time series: an application to mutual funds style analysis, Computational Statistics and Data Analysis, Vol. 47, pp. 353–372.
  27. Piccolo, D., (1990). A distance measure for classifying ARIMA models. J. Time Ser. Anal. 11, 153–164.
  28. Radmehr, F. and Alamolhoda, S. H. (2014). Cluster analysis: a tool for analyzing data in quantitative and mixed method studies. psychological methods and models, fourth year, No. 15, pp. 36-13. (In Persian).
  29. Rezvani, M., Mansourian, H., Mahmoudian zamaneh, M., Heydarian Mohammadabadi, R. (2013). Spatial analysis of unemployment in Urban and Rural Areas in Iran With exploratory Spatial Data Analysis Approach. physical sacial planning, 1(3), 37-48 (In Persian).
  30. Saadat, M., Mila Elmi, Z. and Akbari, N. (2009).Analysis of Spatial Unemployment in Iran, Nameye Mofid,19(69), 151-170 (In Persian).
  31. Seidaii, S., Bahari, E. and Zarei, A.(2011): "Investigation of the Employment and Unemployment Status in Iran during the Years 1335-1389". Jasmine Strategy, No. 25, 241-216 (In Persian).
  32. Shahyadi, S (2004): "Spatial Analysis and Labor Demand in Iran", MSc Thesis, Faculty of Administrative Sciences and Economics, University of Isfahan (In Persian).
  33. Tong, H. Yeung, I. (2000): On Tests for Self-Exciting Threshold Autoregressive-Type Non- Linearity in Partially Observed Time Series, Journal of the Royal Statistical Society C, Vol. 40, Issue 1, pp. 43-62.
  34. Vilar, J.A. Alonso, A.M. and Vilar, J.M. (2010): Non-Linear Time Series Clustering Based on Non-Parametric Forecast Densities, Computational Statistics and Data Analysis, Vol. 54, Issue 11, pp. 2850-2865.