Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

File Splitter (Part 1 Of 2 Parts)


  • Please log in to reply
No replies to this topic

#1 Luthfi

Luthfi

    CC Leader

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1320 posts
  • Programming Language:PHP, Delphi/Object Pascal, Pascal, Transact-SQL
  • Learning:C, Java, PHP

Posted 02 May 2012 - 01:58 AM

Overview

There is this post in The Lounge asking for file splitting algorithm. Visit this thread to see how the problem was solved.

Actually file splitting is quite interesting. Sometimes a big file needs to be splitted into smaller parts. That way it is easier to manage when you want to transfer the file using physical media which available capacity can not handle the original file size. This especially important in the era when floppy disks was dominating the world. To me personally, file splitter project is my "testing ground" to learn how to work with streams in Delphi (i.e. descendant of TStream class). As you will see in the source code, the file splitter project will work heavily with streams.
 
Therefore, the primary target of this tutorial is to show you how to deal with TStream-s, with secondary goal is to show you how to define your own simple file format.

The Basics
 
File Format

Before we split something apart, we have to make sure that we can reassemble the splitted parts back to get the original. In the case of splitting file, in order to be able to get perfectly reassembling, we need the following two information for each split part.

  • Position, specifiies where is the first byte of the part located in the original file.
  • Number of bytes in the part that actually come from the original file.

Beside those, we also need to know the following things, for better reassembling.

  • The name of the original file
  • The original's size for simple checking after the reassembling

Magic Number or File Signature
 
Basically you put a constant magic number (file signature) somewhere in the early bytes of your file to make it easier to properly recognize it. With magic number you don't rely on file extensions. Recall that file extensions are very easy to change. Therefore magic number is more reliable way to recognize format of a file. Visit this wikipedia page for more information on magic number.
 
In our case, we want to use ansi string 'SPLITTED' (without the quotes) for our parts file.


Implementing File Format

So basically our part files will consist of two sections.

  • Header, which contain the explained information above
  • Content, the real part obtained from the original file

Header

For header, we decide to define two types of header. One header for the master file, i.e. the first part, and the other one for the rest of the parts. This is done to avoid redundancy, since we don't have to store the original file name and size in all the parts.

For the master header we are using record like this.

type
  TMasterHeader=packed record
    // our magic number
    ID : array[1..8] of ansichar;
    Flags: Byte; // $01=first file of the group;
                 // $02=last file of the group
    DataPos : Cardinal;
    DataSize: Cardinal;
    // file name of the original file, without path
    // information
    FileName: String127;
    OriSize : Cardinal;
  end;

 
For "ordinary" files we are going to use header declared with record structure like below.

type
  TFileHeader=packed record
    // our magic number
    ID : array[1..8] of ansichar;
    Flags: Byte; // $01=first file of the group;
                 // $02=last file of the group
    DataPos : Cardinal;
    DataSize: Cardinal;
  end; 

 
ID is for our magic number, which we declare as a constant,
 

const
  MAGIC_NUMBER  = 'SPLITTED';

 
String127 is declared like this:

typeString127 = string[127];

 
While none of the headers explain position of the part's content in the original, but we could get this information from the parts' order from their file names.
 
 
The Splitter

Our file splitter is implemented as a class named TFileSplitter. And here is the structure of it. I imagine it will be used like this:

var
  vSplitter: TFileSplitter;
begin
  vSplitter := TFileSplitter.Create;
  try
    // here we supply the file to be splitted
    vSplitter.Source := ... ;
    vSplitter.DestName := ...; // here we tell the target
                               // file name and folder to
                               // store the splitted parts
    vSplitter.DestSize := ...; // here we tell the
                               // maximum file size of
                               // each split part
    vSplitter.Execute; // this is where the file
                       // splitting process is done.
  finally
    vSplitter.Free;
  end;
end;

 

or a shorter version like this:

 

var
  vSplitter: TFileSplitter;
begin
  // ASourceFile is name of the file to be splitted
  // ADestName is the target filename and folder to store
  // the splitted parts
  // ADestSize is the maximum file size of each split
  // part
  vSplitter := TFileSplitter.Create(ASourceFile,ADestName, ADestSize);
  try
    vSplitter.Execute; // this is where the file
                       // splitting process is done.
  finally
    vSplitter.Free;
  end;
end; 

 

 In order to support the above scenario of utilization, I came up with class structure like the following.

type
  TFileSplitter=class
  private
    FSource: string;
    FDestSize: Cardinal;
    FDestName: string;
    FSourceSize: Cardinal;

    procedure SetSource(const Value: string);
  protected
    procedure WriteMasterFile(ASource: TStream;const AFileName: string);
    procedure WriteSplitFile(const ABuffer;const ADataSize: Cardinal; 
      const AFileName: string; const ALastFile: Boolean);
  public
    constructor Create; overload;
    constructor Create(const ASource, ADest: string;const ADestSize: Cardinal); overload;
    procedure Execute;

    // path and file name of the file to be splitted
    property Source: string read FSource write SetSource;
    property SourceSize: Cardinal read FSourceSize;
    // path and the file name mask and extension of the
    // splitted files. The
    // result files will be named with this pattern
    // DestName%.3d.ext.
    property DestName: string read FDestName write FDestName;
    // the max size of the result file.
    property DestSize: Cardinal read FDestSize write FDestSize;
  end;

 
From TFileSplitter class as shown above, I think the most interesting codes lies in Execute, WriteMasterFile, and WriteSplitFile methods.
 

procedure TFileSplitter.Execute;
var
  SF: TStream; // stream to hold the content of source
               // file
  vCount: Cardinal; // to hold the number of bytes
                    // successfully read from the source
                    // stream
  vBuffer: Pointer;
  vBufferSize: Cardinal;
  vDestExt: string;
  vDestName: string;
  vFileIndex: Integer;
  vDestNameMask: string;
begin
  vDestExt := ExtractFileExt(FDestName);
  if vDestExt='' then
    vDestNameMask := FDestName + '%.3d'
  else
    vDestNameMask := ChangeFileExt(FDestName, '')+ '%.3d' + vDestExt;

  if not FileExists(FSource) then
    raise Exception.Create(Format('Source file (%d) does'+ 'not exist', [FSource]));

  if FDestSize < 1024 then
    raise Exception.Create('Destination minimum size '+ 'must be 1kB (1024Bytes');

  SF := TFileStream.Create(FSource, fmOpenRead orfmShareDenyNone);
  try
    vFileIndex := 1;
    vDestName := Format(vDestNameMask, [vFileIndex]);
    WriteMasterFile(SF, vDestName);
    vBufferSize := FDestSize - SizeOf(TFileHeader);
    GetMem(vBuffer, vBufferSize);
    try
      while SF.Position < SF.Size do
      begin
        Inc(vFileIndex);
        vDestName := Format(vDestNameMask, [vFileIndex]);
        vCount := SF.Read(vBuffer^, vBufferSize);
        WriteSplitFile(vBuffer^, vCount, vDestName, SF.Position=SF.Size);
      end;
    finally
      FreeMem(vBuffer);
    end;
  finally
    SF.Free;
  end;
end; 
procedure TFileSplitter.WriteMasterFile(ASource: TStream;const AFileName: string);
var
  DF: TStream;
  vHeader: TMasterHeader;
  vFileName: string;
begin
  FillChar(vHeader, SizeOf(TMasterHeader), 0);
  vHeader.ID := MAGIC_NUMBER;
  vHeader.Flags := FLAG_MASTERFILE; // indicates the
                                    // header is a master
                                    // header
  vHeader.DataPos := SizeOf(TMasterHeader);
  vHeader.DataSize := FDestSize - SizeOf(TMasterHeader);
  vFileName := ExtractFileName(Source);
  vHeader.FileName := vFileName;
  vHeader.OriSize := FSourceSize;
  try
    DF := TFileStream.Create(AFileName, fmCreate);
    try
      DF.Write(vHeader, SizeOf(TMasterHeader));
      if DF.CopyFrom(ASource, vHeader.DataSize)< vHeader.DataSize then
        raise Exception.Create('Source file does not'
                               + 'have enough data. There is no point to'
                               + 'split it. Abort splitting');
    finally
      DF.Free;
    end;
  except
    DeleteFile(AFileName);
    raise;
  end;
end;
procedure TFileSplitter.WriteSplitFile(const ABuffer;const ADataSize: Cardinal;
  const AFileName: string;const ALastFile: Boolean);
var
  DF: TStream;vHeader: TFileHeader;
begin
  vHeader.ID := MAGIC_NUMBER;
  if ALastFile then
    vHeader.Flags := FLAG_LASTFILE
  else
    vHeader.Flags := 0;

  vHeader.DataPos := SizeOf(TFileHeader);
  vHeader.DataSize := ADataSize;
  DF := TFileStream.Create(AFileName, fmCreate);
  try
    DF.Write(vHeader, SizeOf(TFileHeader));
    DF.Write(ABuffer, ADataSize);
  finally
    DF.Free;
  end;
end;

 
 
Demo Project

Here we will build a demo project to see if our file splitter works properly. To check the result, we will inspect the split parts with a Hex Editor software, to see if our scheme is followed correctly.

In this tutorial I used HxD Hex Editor. It's free and in my opinion is very good. Once I used it to restore a lost partition which content was very important to me. Get more information about it here.
 
 
Preparing Demo Application and Its GUI

  • Create new application, or in newer Delphi version you would need to choose New VCL Form Application. You will also get a form named Form1. Save them. Give the application name ccSplitter. For the form, save it as Form_Main.pas.
  • Drop a TPageControl. Find it in tab Win32 in the Component Pallette. Leave its name to the default of PageControl1. Set it's Align property to alClient. This will make PageControl1 covers the whole client area of Form1.
  • Right click on TPageControl and select menu item New Page. This will create a new page, a control of class TTabSheet, in TPageControl. This page will automatically named TabSheet1. Change it's Caption property to "Split File!".
    This page is where where the user input the necessary information to split a file.
  • Drop a TEdit to TabSheet1. Name it edtSourceFile. Size it so its width nearly the same with the width of TabSheet1. Just leave some space to its right about 30-40 pixels.
    This is where we show to the user which file that will be splitted into smaller parts.
  • Drop a TSpeedButton to the right of edtSourceFile. It will be automatically named SpeedButton1. Leave it as is.
    This button will be used to activate a file selection dialog. The result of file selection will be used as source file, which is stored in edtSourceFile.
  • Drop another TEdit to TabSheet1, below edtSourceFile. Name it edtDestFile. Size it so its Left and Width is the same with edtSourceFile's.
    This is where the user specifies the name of the parts. Our program will add the index number after the file name, before the extension.
  • Drop another TSpeedButton, this time to the right of edtDestFile. It will be automatically named SpeedButton2. Leave it as is.
    This button is to activate the file selection dialog to be used for destination file name. The file selection operation result will be stored in edtDestFile.
  • Drop a TComboBox below edtDestFile. Name it cbbSourceSize, and set its Style property to csDropDownList
    This is where we show the size of source file. We use a combobox, so user can easily select to see the size in Bytes, kiloBytes, or in Megabytes.
  • Drop a TEdit to the right of cbbSourceSize. Name it edtDestSize.
    This is where the user enter, by typing, the maximum size of each part.
  • Drop a TUpDown to the right of edtDestSize. TUpDown usually sits in Win32 tab of the Component Pallette. Name it udDestSize. Set its Min property to "1", and Max to "1000".
    This is where the user enter, with mouse clicks, the maximum size of each part.
  • Drop another TComboBox to the right of udDestSize. Name it cbbSizeUnit. Set its Style property to csDropDownList. Enter these (as separate lines) in its Items property.
    • kB
    • MB
  • And finally set its ItemIndex to "0". This will select kB as the default selected item.
  • Drop a TBitBtn under and left aligned with cbbSourceSize. It will be automatically named BitBtn1. Leave it as is, but change its Caption to "Split!".
  • Drop a <em class="bbc">TOpenDialog</em> to the form. It will be automatically named OpenDialog1. Leave it as is.
  • Add a few <em class="bbc">TLabel</em>s and rearrange the controls to get something like shown below.

    FileSplitter_Design01.png
  • Coding The Demo Application
    • Add new unit to the project. Name the new unit as FileSplitter. Place all the code we have discussed above (TFileSplitter and the headers declaration) into this unit. Correct implementation of this step can be inspected in FileSplitter.pas file which is included in the source code attached in the end of this tutorial.
    • Activate Form1 in the IDE, then add FileSplitter unit to the main form's local uses list, by using menu File - Use unit, then select FileSplitter in the list that pop up.
    • Declare a public procedure for TForm1, like this.
      public procedure SetSourceFileSize(const AFileSize: Int64);
      and here is its implementation.
      procedure TForm1.SetSourceFileSize(const AFileSize: Int64);
      begin
        cbbSourceSize.Clear;
        cbbSourceSize.Items.Add(Format('%d Bytes', [AFileSize]));
        cbbSourceSize.Items.Add(Format('%d kB', [AFileSize div 1024]));
        cbbSourceSize.Items.Add(Format('%f MB', [AFileSize/(1024*1024)]));
        if cbbSourceSize.ItemIndex < 0 then
          cbbSourceSize.ItemIndex := 1;
      end;
    • Declare a public function for TForm1, like this.
      public
        function GetDestFileSize: Cardinal;
      and here is its implementation.
      function TForm1.GetDestFileSize: Cardinal;
      begin
        case cbbSizeUnit.ItemIndex of
          1: Result := udDestSize.Position * 1024 * 1024;
          else
            Result := udDestSize.Position * 1024;
        end;
      end;
    • Double click on SpeedButton1 to generate skeleton code for its OnClick event handler. Put the following codes for the event handler.
      procedure TForm1.SpeedButton1Click(Sender: TObject);
      var
        vFile: File;
      begin
        with OpenDialog1 do
        begin
          // adjust the behavior of the open file
          // dialog. We want it to shows
          // and accepts only existing file
          Options := [ofPathMustExist, ofFileMustExist, ofEnableSizing];
      
          // if previously user had selected a file
          // for the source, use it as
          // default selection.
          if edtSourceFile.Text <> '' then
            FileName := edtSourceFile.Text;
      
          // if the user cancels the dialog, we don't
          // need to continue
          if not Execute then Exit;
      
          // store the file selected in open file
          // dialog
          edtSourceFile.Text := FileName;
          // suggest name for the destination file
          // derived from the new selected source
          // file
          edtDestFile.Text := ChangeFileExt(FileName, '.part');
      
          // here we want to get the selected source
          // file size.
          AssignFile(vFile, FileName);
          try
            FileMode := 0;
            Reset(vFile, 1);SetSourceFileSize(FileSize(vFile));
          finally
            CloseFile(vFile);
          end;
        end;
      end;
    • Double click on SpeedButton2 to generate skeleton code for its OnClick event handler. Put the following codes for the event handler.
      procedure TForm1.SpeedButton2Click(Sender: TObject);
      begin
        with OpenDialog1 do
        begin
          // adjust the behavior of the open file
          // dialog, so user can select non-existing
          // file
          Options := [ofPathMustExist,ofEnableSizing];
          if edtDestFile.Text <> '' then
            FileName := edtDestFile.Text;
      
          if not Execute then Exit;
      
          // store the selected file information to the
          // edit control
          edtDestFile.Text := FileName;
        end;
      end;
  • Double click on BitBtn1 to generate skeleton code for its OnClick event handler. Put the following codes for the event handler.
    procedure TForm1.BitBtn1Click(Sender: TObject);varvFileSplitter: TFileSplitter;beginvFileSplitter := TFileSplitter.Create;tryvFileSplitter.Source := edtSourceFile.Text;vFileSplitter.DestName := edtDestFile.Text;vFileSplitter.DestSize := GetDestFileSize;vFileSplitter.Execute;finallyvFileSplitter.Free;end;end;

     
  • Note that this is where we actually use our TFileSplitter class.
     
    Now we are ready to rumble!
     
     
    Running and Testing
    • Run the demo project by pressing F9. Select a source file by clicking SpeedButton1 and select an existing file. Then select a reasonable file part size. You will get something similar like this:

      FileSplitter_Run01.png
    • Click on BitBtn1. Then open the destination folder in windows explorer. You will get something like below.

      FileSplitter_Result01.png
    • Now open your Hex Editor, I am going to use HxD for this. With the hex editor, open the first part file (the one with 001 suffix). It will show something similar like shown below.

      FileSplitter_Result02.png

      Legend:
      • The blue area is our header bytes.
      • The green "highlight" shows our magic number.
      • The red "highlight" shows the original file name that we stored.
    And here is the complete source code for file splitter we discuss in this tutorial. Attached File  ccSplitter.zip   523.65KB   475 downloads

    Feel free to use or improve it. Will be continued to the second part which we will discuss how to assemble the parts together to get the original file.

Edited by LuthfiHakim, 25 February 2013 - 08:06 AM.

  • 0




Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download