Match Up

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Match Up

#16 Post by Aacini » 14 Jan 2015 17:01

catalinnc wrote:this one is very light on variables...feel free to edit it to make it more readable...tested on win xp sp3 up to date...

I am afraid you misunderstood the core point of this problem... The problem is not that the program may use a large amount of variables. The problem is that if the amount is very large, the time required to process a large file may be excessive.

Yes, your program use a few variables, but it takes too much time to process a file when compared vs. the other solution. I did a couple timing tests with both programs; the first one with the Sample Text File given in first post of this topic and the second one with a file 10 times larger (70 lines). Here are the results:

Code: Select all

C:\> Aacini.bat
Start time: 16:24:56.90
End   time: 16:24:56.91

C:\> catalinnc.bat
time is 16:27:24.20 for starting operation
time is 16:27:24.26 for ending operation
Presione una tecla para continuar . . .

C:\> for /L %i in (1,1,10) do @type "_sample text file.txt" >> "_sample text file2.txt"

C:\> copy "_sample text file2.txt" "_sample text file.txt" /Y
        1 archivo(s) copiado(s).

C:\> Aacini.bat
Start time: 16:39:59.56
End   time: 16:39:59.59

C:\> catalinnc.bat
time is 16:40:07.37 for starting operation
time is 16:40:10.49 for ending operation
Presione una tecla para continuar . . .

Antonio

catalinnc
Posts: 39
Joined: 12 Jan 2015 11:56

Re: Match Up

#17 Post by catalinnc » 16 Jan 2015 13:56

I am afraid you misunderstood the core point of this problem... The problem is not that the program may use a large amount of variables. The problem is that if the amount is very large, the time required to process a large file may be excessive.

Yes, your program use a few variables, but it takes too much time to process a file when compared vs. the other solution.


ok...here is a solution light on time too...

Code: Select all

@echo off

setlocal enabledelayedexpansion

echo time is %time% for starting operation

sort < "SampleTextFile.txt" > "SampleTextFileSorted.txt"

type nul > "DuplicateFound.txt"

for /f "delims=" %%A in (SampleTextFileSorted.txt) do (

set "_string_full=%%A"
set "_string_short=!_string_full:~0,44!"

if /i [!_string_short!] equ [!_cache_short!] (

if /i [!_first_time!] equ [true] (

(
echo !_cache_full!
echo !_string_full!
)>> "DuplicateFound.txt"

set "_first_time=false"

) else (echo !_string_full!>> "DuplicateFound.txt")

) else (set "_first_time=true")

set "_cache_full=!_string_full!"
set "_cache_short=!_string_short!

)

echo time is %time% for ending operation

endlocal

pause


_

p.s. this solution will shine on "SampleTextFile.txt" with thousands of lines!!!
_

offtopic

here is a solution for filtering out dupe lines from "SampleTextFile.txt"

Code: Select all

@echo off

setlocal enabledelayedexpansion

echo time is %time% for starting operation

sort < "SampleTextFile.txt" > "SampleTextFileSorted.txt"

type nul > "UniqueEntriesFound.txt"

for /f "delims=" %%A in (SampleTextFileSorted.txt) do (

set "_string_full=%%A"

if /i [!_string_full!] neq [!_cache_full!] (echo !_string_full!>> "UniqueEntriesFound.txt")

set "_cache_full=!_string_full!"

)

echo time is %time% for ending operation

endlocal

pause


_

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Match Up

#18 Post by Squashman » 16 Jan 2015 14:46

Code: Select all

sort < "SampleTextFile.txt" > "SampleTextFileSorted.txt"

Rookie mistake there. Always use the /O option for SORT output to a file. Much faster on large files.

Here is some dedupe code that I believe Dave has posted in the past. This just dedupes a file so you just have one instance of each line in your output.

Code: Select all

:DEDUPE
:: DEDUPE File
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
sort "%file%" /O "%sorted%"
>"%deduped%" (
  set "prev="
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    if /i "!ln!" neq "!prev!" (
      endlocal
      (echo %%A)
      set "prev=%%A"
    ) else endlocal
  )
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
GOTO :EOF

catalinnc
Posts: 39
Joined: 12 Jan 2015 11:56

Re: Match Up

#19 Post by catalinnc » 17 Jan 2015 13:20

Rookie mistake there. Always use the /O option for SORT output to a file. Much faster on large files.


thanks a lot 4 the tip
_

Post Reply