Information Assurance in a Distributed Forensic Cluster

When digital forensics started in the mid-1980s most of the software used for analysis came from writing and debugging software. Amongst these tools was the UNIX utility ‘dd’ which was used to create an image of an entire storage device. In the next decade the practice of creating and using ‘an image’ became established as a fundamental base of what we call ‘sound forensic practice’. By virtue of its structure, every file within the media was an integrated part of the image and so we were assured that it was wholesome representation of the digital crime scene. In an age of terabyte media ‘the image’ is becoming increasingly cumbersome to process, simply because of its size. One solution to this lies in the use of distributed systems. However, the data assurance inherent in a single media image file is lost when data is stored in separate files distributed across a system.

Introducing FCluster

FCluster is a middleware that provides an environment to allow forensic data processing to proceed with assurance. It is a means by which data integrity can be controlled; it is not an application program. The design is based on the following assumptions, derived from current practice and technological developments.

  • Media will continue to grow in capacity and quantity.
  • That ‘cross drive’ forensics will be of increasing importance as individuals have many storage devices, crime is becoming more organised and forensic analysis systems will increasingly have to address multi agency interests.
  • That multi-agency investigation will increase but will experience problems sharing and transferring large datasets.
  • That, because of the above, the notion of the ‘image’ is becoming untenable but will be required for some time as a legacy. Instead evidential data will need to be stored as separate files across the system. Consequently we must expect tens, if not hundreds of millions of files in a system that stores the contents of many forensic images.
  • That in most investigations, the evidence is found in ‘the obvious place’ and that most investigations are a case of
    locating and recording data found.
  • The system should allow existing legacy software to run where possible. This new system should not require Guru
    programming skills with knowledge of devices like GPUs to gain access to huge processing power. However if such
    programs are developed it should allow these as well.
  • That quantity of data requires the development of more automated tools. These can be simple reporting or correlating tools but need to run against large datasets.
FCluster is a peer-to-peer middleware for a network of heterogeneous host computers. The prototype is built on Ubuntu Linux with future development planned for Windows and MacOS. Most of the code is either in C or Bash scripts. It uses MySQL, libcurl, ftp servers and ntfs-3g.