When digital forensics started in the mid-1980s most of the software used for analysis came from writing and debugging software. Amongst these tools was the UNIX utility ‘dd’ which was used to create an image of an entire storage device. In the next decade the practice of creating and using ‘an image’ became established as a fundamental base of what we call ‘sound forensic practice’. By virtue of its structure, every ﬁle within the media was an integrated part of the image and so we were assured that it was wholesome representation of the digital crime scene. In an age of terabyte media ‘the image’ is becoming increasingly cumbersome to process, simply because of its size. One solution to this lies in the use of distributed systems. However, the data assurance inherent in a single media image ﬁle is lost when data is stored in separate ﬁles distributed across a system.
FCluster is a middleware that provides an environment to allow forensic data processing to proceed with assurance. It is a means by which data integrity can be controlled; it is not an application program. The design is based on the following assumptions, derived from current practice and technological developments.
- Media will continue to grow in capacity and quantity.
- That ‘cross drive’ forensics will be of increasing importance as individuals have many storage devices, crime is becoming more organised and forensic analysis systems will increasingly have to address multi agency interests.
- That multi-agency investigation will increase but will experience problems sharing and transferring large datasets.
- That, because of the above, the notion of the ‘image’ is becoming untenable but will be required for some time as a legacy. Instead evidential data will need to be stored as separate ﬁles across the system. Consequently we must expect tens, if not hundreds of millions of ﬁles in a system that stores the contents of many forensic images.
- That in most investigations, the evidence is found in ‘the obvious place’ and that most investigations are a case of
locating and recording data found.
- The system should allow existing legacy software to run where possible. This new system should not require Guru
programming skills with knowledge of devices like GPUs to gain access to huge processing power. However if such
programs are developed it should allow these as well.
- That quantity of data requires the development of more automated tools. These can be simple reporting or correlating tools but need to run against large datasets.