FilePurge

Removes duplicate records from a file.

Syntax:

FilePurge( source-file, destination-file[, ignore-case[, character-set]])

Parameters:

(s) source-file: name of file to purge.

(s) destination-file: name of file with duplicates removed.

(i) ignore-case: (optional) @FASLE (default) or @TRUE, indicating whether record comparisons are case-sensitive or case-insensitive, respectively.

(i) character-set: (optional) see below.

Returns:

(i) The number of duplicate records detected in source-file.

 

FilePurge removes duplicate records from large and small text files with either carriage return/line feed or line feed terminated records. Use the source-file parameter to indicate the file to be purged of duplicates and the destination-file parameter to indicate the file to contain the source-file contents purged of duplicates. The 32-bit version of the function can process files up to about 700MB in size depending on available process memory. The 64-bit version can process files up to available system virtual memory. System virtual memory is defined as the sum of the system swap file and system physical memory.

Note: The same file cannot be used for both source-file and destination-file.

Ignore-case

Set this parameter to @TRUE(1) to perform file record matching without regard to the case of each character in the record. This parameter should only be set to @TRUE when source-file consists of either  ANSI or Windows Unicode characters. If the function detects that the source-file is not ANSI or Windows Unicode, it generates an Invalid flag error.

Note: Setting this parameter to @TRUE significantly increases the time the function takes to process a large file.

Character-set

This optional parameter can have one of the following value:

Value

 Meaning

1

Source- file contains ANSI (multi byte) characters. ( The current user locale.)

2

Source- file contains Windows Unicode characters. ( UTF-16 LE)

3

Source- file contains byte reversed two byte characters. ( UTF-16 BE)

4

Source- file contains UTF-8 characters.

 

Use the character-set optional parameter to indicate the character set type of the source-file.

Note: Any byte-order-mark (BOM) embedded at the beginning of a file will cause the value of this parameter to be ignored . Generally, UTF-16 files contain the BOM. The BOM is considered optional for UTF-8 files so UTF-8 files are more likely to be missing a BOM.

This function supports extended-length path names.

Example:
;;; FilePurge_Example.wbt
; The file has 4,000,000 records with records averaging 139 bytes long.
strToPurge = "F:\temp\Combined.csv"
strPurged  = "F:\temp\Purged.csv"
If FileExist(strPurged) Then FileDelete(strPurged)
Dups = FilePurge(strToPurge, strPurged,@true)
Message("FilePurge Example",Dups:" Duplicates found")
See Also:

BinaryPokeStrW, FileOpen, FileRead, FileWrite