Removes duplicate records from a file.
FilePurge( source-file, destination-file[, ignore-case[, character-set]])
(s) source-file: name of file to purge.
(s) destination-file: name of file with duplicates removed.
(i) ignore-case: (optional) @FASLE (default) or @TRUE, indicating whether record comparisons are case-sensitive or case-insensitive, respectively.
(i) character-set: (optional) see below.
(i) The number of duplicate records detected in source-file.
FilePurge removes duplicate records from large and small text files with either carriage return/line feed or line feed terminated records. Use the source-file parameter to indicate the file to be purged of duplicates and the destination-file parameter to indicate the file to contain the source-file contents purged of duplicates. The 32-bit version of the function can process files up to about 700MB in size depending on available process memory. The 64-bit version can process files up to available system virtual memory. System virtual memory is defined as the sum of the system swap file and system physical memory.
Note: The same file cannot be used for both source-file and destination-file.
Ignore-case
Set this parameter to @TRUE(1) to perform file record matching without regard to the case of each character in the record. This parameter should only be set to @TRUE when source-file consists of either ANSI or Windows Unicode characters. If the function detects that the source-file is not ANSI or Windows Unicode, it generates an Invalid flag error.
Note: Setting this parameter to @TRUE significantly increases the time the function takes to process a large file.
Character-set
This optional parameter can have one of the following value:
Value |
Meaning |
1 |
Source- file contains ANSI (multi byte) characters. ( The current user locale.) |
2 |
Source- file contains Windows Unicode characters. ( UTF-16 LE) |
3 |
Source- file contains byte reversed two byte characters. ( UTF-16 BE) |
4 |
Source- file contains UTF-8 characters. |
Use the character-set optional parameter to indicate the character set type of the source-file.
Note: Any byte-order-mark (BOM) embedded at the beginning of a file will cause the value of this parameter to be ignored . Generally, UTF-16 files contain the BOM. The BOM is considered optional for UTF-8 files so UTF-8 files are more likely to be missing a BOM.
This function supports extended-length path names.
;;; FilePurge_Example.wbt ; The file has 4,000,000 records with records averaging 139 bytes long. strToPurge = "F:\temp\Combined.csv" strPurged = "F:\temp\Purged.csv" If FileExist(strPurged) Then FileDelete(strPurged) Dups = FilePurge(strToPurge, strPurged,@true) Message("FilePurge Example",Dups:" Duplicates found")