In General terms, Data compression means reducing the size of a file using specialized tools and techniques. Data compression can be done on virtually any file. The file could be a program, video, audio, images, documents and so on. It is done to save hard disk space or the bandwidth which is used in case of streaming media files (Casey, 2011). A compressed file can contain both compressed data as well as uncompressed data. Data compression can be broadly classified into two types:
a) Lossless compression
Lossless file compression uses an algorithm technique which doesn't result in any loss of the data. In other words, the compression wouldn't affect the quality of the file and that the file would be identical to the original file, except that the size would be reduced. After the compression, the decompression would result in the original file size and with the original quality. One of the most popular lossless compression methods is the LZW method. Deflate method is a variation of LZW method which is a bit more optimized for speed. WinZip, Alsip and most other compression tools use the Deflate method. WinRAR, the most popular commercial tool for file compression uses a modified version of Deflate (Ladino, n.d.).
b) Lossy Compression
Lossy file compression uses an algorithm technique which results in data loss. In other words, after a file is decompressed back to its original state, it would not be identical and it would have suffered data loss. MPG, JPEG are some of the examples of Lossy data compression (Ladino, n.d.).
How the data compression works ?
The LZW algorithm works by storing and referencing repetitive patterns in the file through its built-in data dictionary. So, a particular information which is repeated a number of times throughout in the file, need not be stored specifically each times, hence it's replaced by just a reference. This algorithm is the most efficient for files such as documents, monochromatic image files such as GIF (Graphical Interchange Format) and TIFF (Tagged image file format) (Casey, 2011)
Implications of compressed files on Forensics.
If a document exists on a computer in a compressed state, then it would stay compressed until it's deleted. So, searching a disk for a keyword that exists in the compressed file which has been deleted would not yield any result. As the length information is preceded by any sort of data within a compressed file, a search for a file header would not yield anything and be unsuccessful. The digital forensic investigator involved in an investigation involving deleted compressed files must choose a tool which is compatible with the compressed files. (For instance Encase Forensic). If the tool doesn't support that, a compressed file would no longer appear the way it is. For instance, a 64 Kb file was compressed to a 12 Kb file using the WinZip compression tool. The file, let's say occupies 10 clusters (clusters are a unit of allocable space for storage within a hard disk) when it is compressed. The MFT record would indicate that the file of size 64 Kb would not fit in 10 clusters and the area around the file would be reported to belonging to that file which in fact is not and is a slack space. If the forensic tool doesn't support compression then it may report that the file is incorrect (Paul Sanderson, 2002).
Since the usage of compressed files in day to day operations is so large and important, it carries a significant importance in the realm of digital forensic. The purpose of a digital forensic investigator here, is to recover the original data or at least some part of the original data so that it can be decompressed and examined at least to identify the material under examination and get something out of it. To demonstrate the difference between a regular file and compressed file, let's take an example of a case where a Microsoft document is to be searched for a text 'XYZ'. By knowing the file signature. It was pretty easy to find the text within a file. But when a compressed file is there, one is not aware of its proper extension as it can be any form. So in that hex editor won't give us the same result like that of regular file.