search for previous item

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
darioit
Posts: 230
Joined: 02 Aug 2010 05:25

search for previous item

#1 Post by darioit » 29 Jan 2022 02:20

Hello I got this problem:
a very huge txt list of content of zip file created using winrar

file: filearchive.txt contain:
#Archivio E:\pippo.zip
2021-01-01 01:01 12345 doc1.pdf
2021-01-02 01:02 23451 doc2.pdf
#Archivio E:\pluto.zip
2021-01-03 01:01 34556 doc3.pdf
2021-01-04 01:02 78123 doc4.pdf
etc.

Using "find" or "findstr" is easy search for document "doc2.pdf" for example, but how can I get a name of zip "E:\pippo.zip" at row #Archivio?

thank you in advance

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: search for previous item

#2 Post by aGerman » 29 Jan 2022 05:52

You have to compare line numbers (notice option /n).

Code: Select all

@echo off &setlocal
set "file=filearchive.txt"
set "search=doc2.pdf"

:: find the line number of the line that ends with <space>%search%
set "row="
for /f "delims=:" %%i in ('findstr /nec:" %search%" "%file%"') do set "row=%%i"
if not defined row exit /b

:: find the last line that begins with # and has a line number less than the previously found line number
for /f "tokens=1,2* delims=: " %%i in ('findstr /nbc:"#" "%file%"') do if %%i lss %row% (set "archive=%%k") else goto escape
:escape

echo "%search%" found in "%archive%"
pause
Steffen

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#3 Post by Squashman » 29 Jan 2022 11:47

Another option. Probably not as fast as aGerman's solution. Don't have a big file to test with us.

Code: Select all

echo off &setlocal
set "file=filearchive.txt"
set "search=doc3.pdf"

for /f "usebackq tokens=1,* delims= " %%G in ("%file%") do (
	if /I "%%G"=="#Archivio" (
		set "archive=%%H"
	) ELSE (
		FOR /F "tokens=1,2,* delims= " %%I in ("%%H") DO (
			IF /I "%%K"=="%search%" GOTO FOR_DONE
		)
	)
)
:FOR_DONE
Echo Archive is: %archive%

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: search for previous item

#4 Post by darioit » 29 Jan 2022 11:59

Many thanks Steffen, I modify first search "findstr /nec:" in "findstr /nc:" to search also a part of document
Just a little problem, if document is present in more zip file works only for last record find

@Squashman slowly but can't find a zip match, sorry

Regards

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#5 Post by Squashman » 29 Jan 2022 12:05

darioit wrote:
29 Jan 2022 11:59
@Squashman slowly but can't find a zip match, sorry
I will disagree. It finds the matches based on your provided data. If it did not find a match then your provided data is not what you gave as an example.

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: search for previous item

#6 Post by darioit » 29 Jan 2022 12:23

sorry you right works, in my example works fine, but I try in real word and it doesn't work
Last edited by darioit on 29 Jan 2022 12:30, edited 1 time in total.

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: search for previous item

#7 Post by darioit » 29 Jan 2022 12:26

this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#8 Post by Squashman » 29 Jan 2022 13:09

darioit wrote:
29 Jan 2022 12:26
this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
And you should see exactly why my code does not work with that. You have been around long enough to know how the FOR /F command works with TOKENS.
:x :x

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#9 Post by Squashman » 29 Jan 2022 13:30

darioit wrote:
29 Jan 2022 12:26
this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
aGerman's code doesn't technically provide the intended output either because again the real world example does not reflect the correct amount of TOKENS in the line because your actual example has a space after the HASH symbol!

So if I search for file2.pdf with aGerman's code it will output Archivio d:\zipfile1.7z. You clearly said you just wanted the file path only.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#10 Post by Squashman » 29 Jan 2022 14:06

New code.

Code: Select all

@echo off &setlocal
set "file=filearchive2.txt"
set "search=doc190049.pdf"

for /f "usebackq tokens=1,2,* delims= " %%G in (`findstr /IC:"#" /IC:"%search%" "%file%"`) do (
	if /I "%%G"=="#" (
		set "archive=%%I"
	) ELSE (
		FOR /F "tokens=1,2,* delims= " %%J in ("%%I") DO (
			IF /I "%%L"=="%search%" GOTO FOR_DONE
		)
	)
)
:FOR_DONE
Echo Archive is: %archive%
I created a file with 90,000 archives and 10 files in each archive. So the file has 990,000 rows.
I searched for a file that was on row 99,045

Here are the timings using my code and aGerman's code.

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat
Archive is: E:\19004.zip

TimeThis :  Command Line :  Squashman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:03 2022
TimeThis :      End Time :  Sat Jan 29 13:57:31 2022
TimeThis :  Elapsed Time :  00:00:28.397

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat
Archive is: Archivio E:\19004.zip

TimeThis :  Command Line :  aGerman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:54 2022
TimeThis :      End Time :  Sat Jan 29 13:58:30 2022
TimeThis :  Elapsed Time :  00:00:36.192
Here is the batch file I used to create the test file.

Code: Select all

@echo off
(FOR /L %%G IN (10000,1,99999) DO (
	ECHO # Archivio E:\%%G.zip
	FOR /L %%H IN (0,1,9) DO (
		ECHO 2015-01-01 22:51 56091 56369 doc%%G%%H.pdf
	)
)
)>FileArchive2.txt

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: search for previous item

#11 Post by darioit » 29 Jan 2022 14:34

good job, but if I need to search only a parts such "19004"?

thank you in advance

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: search for previous item

#12 Post by Aacini » 29 Jan 2022 20:46

May I enter the timing test contest? 8)

Code: Select all

@echo off
setlocal

set "file=test.txt"
set "search=file4.pdf"

for /F "tokens=1,2*" %%a in ('findstr "^# %search%" "%file%"') do (
   if "%%a" equ "#" (
      set "archive=%%c"
   ) else (
      goto break
   )
)
:break
echo %archive%
darioit wrote:
29 Jan 2022 14:34
good job, but if I need to search only a parts such "19004"?

thank you in advance
You may search for any part you want as long as the part uniquely identify the file you want. If the part you specify may match two or more files, then just the first one is returned...

Antonio

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: search for previous item

#13 Post by Squashman » 30 Jan 2022 12:41

Aacini wrote:
29 Jan 2022 20:46
May I enter the timing test contest? 8)
find the file at row 99045

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat
E:\19004.zip

TimeThis :  Command Line :  Antonio.bat
TimeThis :    Start Time :  Sun Jan 30 12:39:45 2022
TimeThis :      End Time :  Sun Jan 30 12:40:14 2022
TimeThis :  Elapsed Time :  00:00:28.519
And now I realize that I over programmed my code!

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: search for previous item

#14 Post by Aacini » 30 Jan 2022 16:46

Squashman wrote:
29 Jan 2022 14:06

Here are the timings using my code and aGerman's code.

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat
Archive is: E:\19004.zip

TimeThis :  Command Line :  Squashman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:03 2022
TimeThis :      End Time :  Sat Jan 29 13:57:31 2022
TimeThis :  Elapsed Time :  00:00:28.397

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat
Archive is: Archivio E:\19004.zip

TimeThis :  Command Line :  aGerman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:54 2022
TimeThis :      End Time :  Sat Jan 29 13:58:30 2022
TimeThis :  Elapsed Time :  00:00:36.192
Squashman wrote:
30 Jan 2022 12:41
Aacini wrote:
29 Jan 2022 20:46
May I enter the timing test contest? 8)
find the file at row 99045

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat
E:\19004.zip

TimeThis :  Command Line :  Antonio.bat
TimeThis :    Start Time :  Sun Jan 30 12:39:45 2022
TimeThis :      End Time :  Sun Jan 30 12:40:14 2022
TimeThis :  Elapsed Time :  00:00:28.519
And now I realize that I over programmed my code!
I was expecting a bigger decrease in execution time... :(





I was looking for a different method that allows a faster execution. I used the "Searching across line breaks" (undocumented) feature of findstr command. Here it is:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "file=test.txt"
set "search=file4.pdf"

for /F %%a in ('copy /Z "%~F0" NUL') do set ^"CRLF=%%a^
%empty line%
^"

set "searchCRLF=#.*%search%"

:nextSearch
for %%a in ("!CRLF!") do set "searchCRLF=!searchCRLF:#=#.*%%~a!"
findstr "!searchCRLF!" "%file%" > result.tmp
if errorlevel 1 goto nextSearch
for /F "tokens=3" %%a in (result.tmp) do echo %%a
I am pretty sure that this method will be faster (perhaps much faster) than the previous one when the search file is the first one in the archive. However, the method will be every time slower as the search file be placed in posterior lines in the archive...

I wonder after how many lines the timing of this method will be similar than the previous one.

Antonio

Eureka!
Posts: 137
Joined: 25 Jul 2019 18:25

Re: search for previous item

#15 Post by Eureka! » 30 Jan 2022 19:07

Probably not the fastest one (didn't check), but for a different perspective.
It *does* return all archives that contain the %search% pdf.

Code: Select all

@echo off

set "file=filearchive2.txt"
set "search=doc190049.pdf"


	findstr.exe /n /I /C:"#" /I /C:"%search%"  "%file%" | sort.exe /r /o temp1.txt

	for /f "usebackq tokens=1 delims=:" %%I in (`findstr.exe /n /i /c:"%search%" temp1.txt`) do call :GetArchive %%I

	del temp1.txt
goto :EOF


:GetArchive
	for /f "usebackq  skip=%1  tokens=2 delims=#" %%X in (temp1.txt) do (
		echo %%X
		goto :EOF
	)	

Should be possible to skip creating a temp-file, but I expected to run into memory constraints

Post Reply