Association Rule Analysis for Criminal data

This is a SJSU course project.  We need to generate rule from Criminal data to see any criminal activity cause issue.

  1. The data

It is got from the official website DataSf.


  1. The features

It only has several useful features for the association rule generation. Those are Category, Descript, DayOfWeek, Data/Time, PDDistrict, Resolution and Address. Below is some sample data.

  1. The data Preprocess

As some of the features in the above are similar, or not yet good higher extract, we should pre-process the data before we go into next algorithm step. We drop non-useful features and add some high level features, such as Monday, Month, Afternoon, Hour, etc. Here is final data sample.

The reason why we add the feature name in each data value is to make sure every feature value are different than other value in another feature. So that, it will be sure no wrong association data counted and used twice in the different features.

  1. The apriori algorithm

The implementation is referred to this document,

We will not mention detail for the algorithm, as the document above explains very detail.  The algorithm count the pairs of feature data and the frequent set, and then use the confidence/support threshold to filter those low confidence/support pair. The remaining pair will be used to generate the final association rule.  Here is the sample code.

  1. The result

Below is the part of data screen output.

From the above part of result, the algorithm is correct, as last result shows “Time Hour=21” is able to deduce to “Time cat=night”. However, such rule is useless as it is obvious and man-made features.

After filter those man-made features, we got some insight information rule as below:

It looks more deep social phenomena found than previous “hour 21 -> Night” rule, for example, {‘Category=VEHICLE THEFT’} will almost get {‘Resolution=NONE’}. It suggests that police has no solution for this type of criminal activity.  

Leave a Reply