DATA WAREHOUSES IN CRIMINAL INVESTIGATION

in #steemstem6 years ago

The lack of well-designed knowledge management systems in an intensive and time-critical law enforcement environment poses interesting problems for information technology professionals. The objective of this work is to be able to study how the different databases of the Police Department of the Commonwealth of Puerto Rico (PRPD) can be integrated. It is proposed the creation of a data warehouse to store relevant criminal data that will be useful in the prevention of crime, locate criminals and then bring them to justice. The Criminal Information Support System (CISS) project includes the integration of multiple systems that have different functionalities. CISS is a cost-effective web-based system to allow law enforcement departments to share information from different data sources. A descriptive study of a proposed data repository system is presented that seeks to improve the administration of criminal data in different ways, such as developing effective plans for the prevention of a crime, finding patterns of useful criminal behavior and making correlations between similar circumstances in order to solve very difficult cases successfully and thus avoid more casualties. This is the first stage of a larger project that includes online analytical processing (OLAP) tools suitable for advanced extractions and a decision support system.

The problem of collaboration and the exchange of information between police units in Puerto Rico have gradually increased over the years. The evolution of technology has changed the way in which criminals violate the law more than ever before law enforcement agencies due to the lack of an efficient collaboration mechanism. The investigation process in the Puerto Rico Police Department (PRPD) follows several steps. The police officer receives a call and goes to the place where the incident occurred. Then, he / she obtains information from the scene, makes an arrest if necessary and completes a report. A detective receives the case, reviews it and does a little research and interviews. When appropriate, there are more arrests and reservations and the case is ready for trial. The PRPD does not currently have integrated information systems, but they have already begun the implementation of a centralized unit of analysis. After receiving the call, the police officer notifies the central office of the PRPD, which is also in charge of receiving calls from citizens. In the central office the officers are assigned to cases, the data of criminal incidents are entered in the available information systems and stored in databases, and references are made if the situation requires it.

The process of managing the criminal incidents of the PRPD depends on the type of crime, when a traditional crime is committed, the calls are recorded in the central office and then the steps that were previously mentioned are followed until finally reaching the trial. If the crime is already classified as a cybercrime, it is sent to the Federal Bureau of Investigation (FBI), however the PRPD began to implement a unit against cybercrime. The process by which all the criminal information that is used by officers and detectives is collected does not have an effective integration and collaboration system. There are many databases that are used for this purpose that are not interconnected, which is why it is very difficult to get criminals brought to justice, as a result of incomplete critical time information. Figure 1 presents the process of management of criminal incidents in the PRPD.

Figure 1. Process of management of criminal incidents in the PRPD.

In the application of the law as well as in many others, the collaboration that exists between the members of the team which are dynamically defined and task-oriented plays a very important role in the operations of daily life. To have an effective collaborative project, it is necessary to have a system that collects and processes data in a standardized manner so that any agency can retrieve and use it. In 1997 the International Institute of Justice created a project which was named COPLINK which is a system of local exchange of information and other agencies, in partnership with the Artificial Intelligence Laboratory of the University of Arizona. Through this system, the stored information can be exchanged between officials and other security agencies almost simultaneously. COPLINK is a package of tactical line level solutions for the problem of inaccessible or unrecoverable information resulting from disparate systems of police information that lack a common language or platform.

Figure 2. Coplink system architecture.

The design of a data warehouse is often a great challenge for developers. This affects many business areas of the company that manage a large workload of thousands of daily transactions. A data warehouse can be defined as an integrated information repository, for available queries and analyzes. This means that they are used to store data for queries that will be of great help to a particular company or business. A data warehouse provides an infrastructure which allows companies to extract, clean and store large amounts of corporate data from operating systems for efficient and accurate responses to user queries. It also trains knowledge workers with information that allows them to make decisions that are based on a solid fact base.

Data warehouses are online analytical processing systems (OLAP) which are suitable to answer questions involving analysis, including aggregation, breakdown and data cutting / splitting. There are a lot of methodologies and data storage tools available to support the growing demand for these systems. The data warehouse implementation activities include the provision of data, the organization of data (ETL) and the development of end-user applications which is oriented towards decision support. Instead of starting from the requirements, the development of the data warehouse must be driven by data. The data is first collected, integrated and then tested.

There are multiple approaches to build data warehouses. There are many strategies available to design the data warehouse architecture, ranging from the data warehouse design for the entire company to the design of data mart. The organization must determine which approach is the most appropriate before adopting a methodology. The data center design consists of several departmental or local data markets that are combined in a data warehouse. It is a quicker and easier implementation of manageable parts with less risk of faults. Data markets are intended for different types of information collected from the different databases used by law enforcement agencies.

The system will be called the Criminal Information Support System (CISS). CISS will be able to collect data found in different databases and consolidate them in a single repository that could be consulted to find patterns, correlations, criminal information and incidents using a single tool. The PRPD has the following databases that can only be accessed in the main office.

  1. CAD system (Positron) to capture incidents in the Microsoft SQL server.
  2. Separate system for police photos (photos taken at the time the court found causes of arrest).
  3. Analysis and Statistics System (SAEC) to collect all the main incidents as a journal in the MYSQL database.
  4. Incident Detention System (SADIC) to present to the court in the MYSQL database.

The following steps must be performed to prepare the data source profiles.

1. Identification of data sources inside and outside the organization and examination of the data format. The PRPD has eight relevant data sources.

  • POSITRON: format=SQL server
  • SAEC: format=MYSQL
  • SADIC: format=MYSQL
  • DRUGS: format=MYSQL (En construcción)
  • WEAPON REGISTRY: format=SQL Server
  • MUGSHOT: format=SQL Server
  • VEHICLE: format=SQL Server
  • SEXUAL OFFENDERS: format=SQL Server

2. Location of the data of interest: The data in these data sources have a similar syntax and are compatible. This step is necessary to identify similarities between data types, missing values and values with inconsistent data types. A conceptual data model has to be prepared to verify the facts and dimensions, and to design the data mart schema that best suits the data warehouse. The facts represent quantitative data about a business entity transaction that is an event for which we need to capture and store data. Factual data are more stable than dimensional data because dimensional data change more frequently over a period of time than objective data.

Figure 3. Process of collaboration in a law enforcement agency.

The Transition Area model, often called the Extraction-Transform-Load (ETL) model, is used to extract data from the source systems and transform different source data standards into a single one. It is considered the core of the data storage project because the effectiveness of the data warehouse that is ready to work with a decision support tool depends on how good the data standardization is. Staging is necessary to enforce data quality and consistent standards for data integration so that separate sources can be used together and data can be successfully loaded into data markets.

Azhar suggests the use of data profiles to perform the exam, as it represents a systematic examination of the contents, structure and quality of a data source. As proposed by Azhar we will use the following steps: (1) extraction and transformation, (2) validation, filtering and corrections, and (3) integration, which addresses the same types of data, as well as different types of data.

Data validation, filtering and integration are important steps in the staging process. The purpose of the validation is to guarantee the quality of the data and correct errors, omissions or inaccuracies before loading the data loaded in the data markets.

Defective records are corrected before moving on to the next record. This process has to follow some additional rules for the accurate filtering of criminal data from each data source just after the data validation, filtering and integration processes are performed. The purpose of data integration is to gather data from different sources according to the thematic area. It is important to note that for this project this process must be done in real time because the criminal data is updated every hour.

After the integration process, the data must adapt to the selected data mart schema, the best approach to the design of the scheme is the star schema. This scheme is a dimensional model composed of a central fact table and a set of surrounding dimension tables. A fact table is a specialized relationship with a multiple attribute key and contains attributes whose values are usually numeric and additive. A dimension table has a single primary attribute key that corresponds to one of the attributes of the multiple attribute key of the fact table.

A prototype of the data platform was created using MYSQL which is used to verify the metadata framework. The metadata is used as part of the data extraction and loading process to map the data sources to the common view of your information within the data market. It is used as part of the query management process to direct a query to the respective data source.

These are the main design criteria considered for the CISS project that are adapted from those considered for COPLINK.

Data storage is a valuable alternative to traditional approaches to integrate and access data from autonomous and heterogeneous information sources. The storage approach is particularly useful when high query performance is desired, or when information sources are often expensive or transient. The proposed data repository system aims to integrate different data sources that contain large amounts of criminal data, while improving collaboration and the exchange of information between police departments within Puerto Rico with the United States. As a result, there will be a system which will allow them to promote the exchange of information among the different agencies' information sources, and to capture the connections between people, places, events and vehicles, based on historical data. The following are future plans derived from the analysis of the proposed system.

  1. Connect the data warehouse to a decision support system to track patterns, correlations and grouping.
  2. Design a web page application that is easy to use for each employee, civil or official, to efficiently manage the search for criminal information.
  3. Explore the development of textual mining approaches that support the recovery of knowledge from such sources for case law enforcement reports.
  4. The addition of an integrated multimedia database system to promote the exchange of information connected to online analytical tools for the analysis of criminal intelligence.

Network analysis is important to understand the structure and organization of criminal enterprises. Advanced and automated techniques and tools are needed to extract knowledge about criminal networks efficiently and effectively. This approach is a first step towards this goal, because the data storage technology includes OLAP tools that are suitable for extractions and advanced analysis. In the context of cybercrime investigation, this system can be applied effectively to examine the patterns of Internet use, the recognition of writing styles in email messages, among others. What is missing is to be able to adapt the real laws and judgments of prosecution to account for the validity of the investigation of cyber-crime that can lead to more apprehensions and imprisonment of these offenders. Finally, the evaluation of these applications of knowledge management and intelligence analysis demonstrate all the potential they have to transform law enforcement practices in this era of digital governments.

Figure 4. Suggested architecture for the data warehouse proposed for CISS.

Bibliographic References and Source of the images:

  1. Web Site: https://libra.unine.ch/export/DL/Fabrizio_Albertetti/18540.pdf 
  2. Web Site: http://iacis.org/iis/2011/445-454_AL2011_1745.pdf 

I hope you enjoyed the content, see you soon.

Sort:  

Congratulations @merlinrosales96! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.27
TRX 0.13
JST 0.032
BTC 65852.33
ETH 2958.72
USDT 1.00
SBD 3.73