Affordable Access

CWAAP: An Authorship Attribution Forensic Platform for Chinese Web Information

Publication Date
  • Cwaap
  • Authorship Attribution
  • Forensic
  • Support Vector Machine
  • Chinese
  • Web Information
  • Criminology
  • Law


Illegal web information is common on the Internet. To prevent phenomena of illegal web information from happening, providing effective evidence for court to punish the criminals by means of law is one effective method. In this paper, an authorship attribution platform for Chinese web information, CWAAP, is described. Based on the language characteristics of Chinese web information, lexical features and structural features which can express the author’s writing habit are extracted. Support vector machines (SVM) are used for learning author’s writing features. To test the effectiveness of CWAAP, literature, Blog and BBS datasets are used in the experiments on the platform. Five experiments are performed. Experimental results show that lexical features and structural features are effective. The number of words in training samples should exceed 200 at least. By Information Gain feature selection methods, 800 lexical features can express the authors’ writing style. There is a small difference between the authors’ topics. All the parts of speech reserved are perfect. These results confirm that the platform is effective and feasible for cybercrime forensic.

There are no comments yet on this publication. Be the first to share your thoughts.