Affordable Access

Access to the full text

Examining the impact of cross-domain learning on crime prediction

  • Bappee, Fateha Khanam1
  • Soares, Amilcar2
  • Petry, Lucas May3
  • Matwin, Stan1, 4
  • 1 Dalhousie University, Halifax, Nova Scotia, Canada , Halifax (Canada)
  • 2 Memorial University of Newfoundland, St. John’s, Canada , St. John’s (Canada)
  • 3 Universidade Federal de Santa Catarina, Florianópolis, Brazil , Florianópolis (Brazil)
  • 4 Polish Academy of Sciences, Warsaw, Poland , Warsaw (Poland)
Published Article
Journal of Big Data
Springer Nature
Publication Date
Jul 03, 2021
DOI: 10.1186/s40537-021-00489-9
Springer Nature
  • Research


Nowadays, urban data such as demographics, infrastructure, and criminal records are becoming more accessible to researchers. This has led to improvements in quantitative crime research for predicting future crime occurrence by identifying factors and knowledge from instances that contribute to criminal activities. While crime distribution in the geographic space is asymmetric, there are often analog, implicit criminogenic factors hidden in the data. And, since the data are not as available or comprehensive, especially for smaller cities, it is challenging to build a uniform framework for all geographic regions. This paper addresses the crime prediction task from a cross-domain perspective to tackle the data insufficiency problem in a small city. We create a uniform outline for Halifax, Nova Scotia, one of Canada’s geographic regions, by adapting and learning knowledge from two different domains, Toronto and Vancouver, which belong to different but related distributions with Halifax. For transferring knowledge among source and target domains, we propose applying instance-based transfer learning settings. Each setting is directed to learning knowledge based on a seasonal perspective with cross-domain data fusion. We choose ensemble learning methods for model building as it has generalization capabilities over new data. We evaluate the classification performance for both single and multi-domain representations and compare the results with baseline models. Our findings exhibit the satisfactory performance of our proposed data-driven approach by integrating multiple sources of data.

Report this publication


Seen <100 times