Search in 1.txt for missing entrys from 2.txt

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Search in 1.txt for missing entrys from 2.txt

#1 Post by axi92 » 02 Jun 2016 05:19

I have 2 .txt files and the lines are like this:

1.txt

Code: Select all

ezekiel7eu89au4 
drone5sg25ez6
kureze3ft26jc6          5 R3, 5 X3, S, T
ni7up28fu6             5 R4, 5 X4, S, LA
drone9rc88jy5          S, Multi-hack, 5 X4, 5 R4


2.txt

Code: Select all

ada3zc36qq9
blub
ada9yv83mp5
algorithm9gh35cj3
bletchley9ob65ca4


So i want to find that word in 2.txt that is NOT! in the 1.txt.

Tried that but did not work:

Code: Select all

findstr /V /B /N /g:stamm.txt /f:suchindex.txt >> output.txt

Output is that:

Code: Select all


...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt

...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt

...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt
........

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Search in 1.txt for missing entrys from 2.txt

#2 Post by penpen » 02 Jun 2016 17:44

If the file "1.txt" is not too big and no poison characters are possible ('%', '!', ...),
then you could create a single findstr expression:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
set "search=findstr /B /E /L /V"
   for /F "usebackq tokens=* delims=" %%a in ("1.txt") do (
      set "search=!search! /C:"%%a""
   )
)
%search% "2.txt"
endlocal


penpen

axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Re: Search in 1.txt for missing entrys from 2.txt

#3 Post by axi92 » 02 Jun 2016 23:21

Ty for the code but when i start the batch script the window stays black...?
Is it possible to export the missing lines from 2.txt that were not found in the 1.txt into output.txt?
So i can copy them easily into the 1.txt.

i renamed 1.txt to stamm.txt and 2.txt to suchindex.txt

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
set "search=findstr /B /E /L /V"
   for /F "usebackq tokens=* delims=" %%a in ("stamm.txt") do (
      set "search=!search! /C:"%%a""
   )
)
%search% "suchindex.txt"
endlocal
PAUSE



And the var %search% is at the and this:

Code: Select all

findstr /B /E /L /V /C:"johnson3ba26qb2" /C:"jackland8vf92qz5 " /C:"ezekiel7eu89au4 " /C:"drone5sg25ez6 " /C:"kureze3ft26jc6          5 R3, 5 X3, S, T" /C:"ni7up28fu6             5 R4, 5 X4, S, LA" /C:"drone9rc88jy5          S, Multi-hack, 5 X4, 5 R4" /C:"glyph7jb25yw3          S, LA, 5 X2, 5 R2" /C:"ezekiel3xh34ug4       S, 5 X1, 5 R1, FA" /C:"susanna3ku75cm9       US3, Multi-hack, 5 X3, 5 R3" /C:"ingress3nd85fu9       S, 5 X3, 5 R3, Heat Sink" /C:"blue2xc26da2         US3, T, 5 X3, 5 R3" /C:"ada9yv83mp5          US3, Multi-hack, 5 X3, 5 R3" /C:"vi8zu85il7             50 XM, 50 AP, 4 X1, 4 R1" /C:"vi9rp62ex1            50 XM, 50 AP, 4 X1, 4 R1" /C:"lightman4tm34zf3      US4, 5 X4, 5 R4, Heat Sink" /C:"algorithm9ek27ux3      S, T, 5 X3, 5 R3" /C:"glyph6yt84kt8         S, LA, 5 X4, 5 R4" /C:"johnson4yn13db2         US3, FA, S, T, 5 X4, 5 R4   " /C:"eVoLvE7Yo65nm9         US3, LA, 5 X3, 5 R3" /C:"phillips6wc29mc7      S, 5 X4, 5 R4, Heat Sink" /C:"blue3dg99cm6         US3, T, 5 X3, 5 R3" /C:"hubert6db54fa6         US4, 5 X4, 5 R4, FA" /C:"green7dv85mp8         S, LA, 5 X2, 5 R2" /C:"spacetime7ap46rr6      US2, 5 X2, 5 R2, FA" /C:"CASSANDRA2YU35CP6      US1, T, 5 X1, 5 R1" /C:"resonate3yd72he7      US1, 5 X1, 5 R1, FA" /C:"resonate6wb48ec4      US4, 5 X4, 5 R4, FA" /C:"johnson3fx84aw9         US3, LA, 5 X3, 5 R3" /C:"lightman8nd48zb2      US2, 5 X2, 5 R2, Heat Sink" /C:"message5ka73rp4         S, Multi-hack, 5 X4, 5 R4" /C:"devra2gt69qx7         US4, 5 X4, 5 R4, Heat Sink" /C:"powercube5yn73em6      S, 5 X1, 5 R1, FA" /C:"roland7br76tp5         S, 5 X1, 5 R1, FA" /C:"lvboynxaie            1000 XM, 10 C1, 2 Multi-hack, 10 R1, 2 Heat Sink" /C:"vi9uh67mo6            50 XM, 50 AP, 4 X1, 4 R1" /C:"minotaur8bb28et5      T, US2, 5 X2, 5 R2" /C:"message6ca48vf7         S, Multi-hack, 5 X1, 5 R1" /C:"cube8MK95JJ7         US4, 5 X4, 5 R4, Heat Sink" /C:"kjwbwy46yw            15 C2, 10 US3, 20 R1" /C:"jackland4dz47yf6      US2, Multi-hack, 5 X2, 5 R2" /C:"evolution2gu76gm3      S, 5 X1, 5 R1, Heat Sink" /C:"creativity4pb44pf6      S, 5 X2, 5 R2, Heat Sink" /C:"alignment9nb75yo5      S, 5 X3, 5 R3, Heat Sink" /C:"artifact3ne73hh3      5 R1, 5 X1, S, FA" /C:"artifact4tt67xg9      s, 5 X1, 5 R1, FA" /C:"symbols4ye57bs7         5 R3, 5 X3, US3, MH" /C:"roland8cx62mk4" /C:"jarvis2kn66cz2         US3, T, 5 X3, 5 R3" /C:"inveniri2he69ar3      5 R2, 5 X2, US2, FA" /C:"inveniri2hc78yy4      5 Rs, 5 X1, US1, FA" /C:"hulong7tr85ub6" /C:"hubert4su42qt2         " /C:"field4mo46jx6         S, LA, 5 X3, 5 R3" /C:"farlowe2ft72ym5         S, T, 5 X2, 5 R2" /C:"deaddrop7dt73am6      S, T, 5 X4, 5 R4" /C:"deaddrop6bf98mr2      S, T, 5 X3, 5 R3" /C:"cube8mk95jj7         US4, 5 X4, 5 R4, HS" /C:"evolution6xu68ru7      5 R1, 5 X1, S, HS" /C:"creativity2pc98zp5      S, 5 X2, 5 R2, HS" /C:"creative3vk97yv4      US3, 5 X3, 5 R3, Heat Sink" /C:"susanna7og34vw3         US4, Multi-hack, 5 X4, 5 R4" /C:"vi2jo15nd0            50 AP, 50 XM, 4 R1, 4 X1" /C:"timezero2qm72ut8      S, LA, 5 X1, 5 R1" /C:"80JDFITMAR            5 C8, 5 US8, 10 X8, 10 R8, 5 S            x" /C:"alignment3qh24up8      S, 5 X1, 5 R1, HS                     x" /C:"glyphs4gt83bm4?         US1, LA, 5 X1, 5 R1                     x" /C:"3OUV7WU5A4" /C:"spacetime4je35kf5      US4, 5 X4, 5 R4, FA                     x" /C:"cassandra3wh77rg4      T, US2, 5 X2, 5 R2                     x" /C:"cern2vb46cn7         S, Multi-hack, 5 X1, 5 R1               x" /C:"vi9bb02fk7            50 AP, 50 XM, 4 R1, 4 X1" /C:"ingress9tu32jk7         5 4R, 5 4X, S, HS" /C:"80jdfitmar?            5 C8, 5 US8, 10 X8, 10 R8, 5 S" /C:"Fully redeemed:" /C:"algorithm9gh35cj3" /C:"bletchley9ob65ca4" /C:"conflict5av38pw2" /C:"field5jk36yh6" /C:"portal7cc88cd2" /C:"ada3zc36qq9" /C:"cern5wu99oq2" /C:"chaotic5gg23pf9" /C:"cube8aa87xd2" /C:"timezero2kk78gx5      " /C:"moyer5pp56fg2" /C:"tycho7vu99ta2" /C:"niantic9ns77ww9" /C:"green3ou25jt4" /C:"evolve5uu33zd4" /C:"jarvis5ye63mv9" /C:"glyphs6gj75yq2" /C:"kureze2sg38gt2" /C:"Need Testing:" /C:"minotaur8dm83gg5" /C:"moyer4wr38qz8" /C:"niantic4rv29wc6" /C:"powercube3hu72ut7" /C:"tycho9uo99qa2" /C:"voynich6sx52zr5" /C:"wolfe7jq38cj3" /C:"4apz5symbolsx9u9b" /C:"creative3nc46wp7" /C:"Cube8aa87xd2?" 

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Search in 1.txt for missing entrys from 2.txt

#4 Post by Squashman » 03 Jun 2016 06:44

So basically what you are trying to do is merge two files together without any duplicate lines?

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Search in 1.txt for missing entrys from 2.txt

#5 Post by penpen » 03 Jun 2016 17:38

axi92 wrote:when i start the batch script the window stays black...?
Is it possible to export the missing lines from 2.txt that were not found in the 1.txt into output.txt?
The above script is doing this exactly, and works with the above test data (== all lines of 2.txt are found)!
If the window stays black, then 3 things (or more?) may happen:
- there is no such missing line in 2.txt,
- the script may take longer than you expect, or
- somehow a findstr bug is triggered... (but i doubt this).
Maybe you could link some real world data, and post what you expect to see, so i could check the results.

The content of your tested file ("Stamm.txt") seems to be short enough (resulting command size in characters is ~ 4KB < 8KB),
but maybe you have additional files that may have problematic lengths, so the above script may fail on such files.

Maybe one could use another algorithm, but then some more details were helpfull:
- Which characters are involved 0x32-0x7F (hexadecimal values), or more (-> set/P or for/F)?
- How long are the maximum sizes of the files "1.txt" and "2.txt"?


penpen

axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Re: Search in 1.txt for missing entrys from 2.txt

#6 Post by axi92 » 05 Jun 2016 22:42

Filesize is not the matter the maximum filesize is about 80 lines^^
The 1.txt is about 3,26kb and 2.txt 949 bytes

I don't want to post my codes here because i dont want google to index these lines...
Here is the dropbox link: https://www.dropbox.com/s/82blon9i91svnp7/test.zip?dl=0

Compo
Posts: 600
Joined: 21 Mar 2014 08:50

Re: Search in 1.txt for missing entrys from 2.txt

#7 Post by Compo » 06 Jun 2016 01:50

I get an output.txt of 34 lines using your supplied data and this:

Code: Select all

FindStr/VXG:"1.txt" "2.txt">output.txt

axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Re: Search in 1.txt for missing entrys from 2.txt

#8 Post by axi92 » 06 Jun 2016 02:29

It works better than the codes before but the word:

Code: Select all

spacetime7ap46rr6

is marked as missing, I think because of the additional information that is in the 1.txt (" US2, 5 X2, 5 R2, FA")

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Search in 1.txt for missing entrys from 2.txt

#9 Post by penpen » 06 Jun 2016 03:07

The above does not work, because findstr seems to have a limit of 100 "/c:"-search strings and the sample file "1.txt" contains 104 lines:
I never noticed this limitation in findstr before.

So this may help you:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion

> "1.txt.tmp" (
   set "line="
   for /F %%a in ('findstr /B /I /G:"2.txt" "1.txt" ^| sort') do if not "%%~a" == "!line!" (
      set "line=%%~a"
      echo(%%~a
   )
)
> "output.txt" FindStr /V /X /I /G:"1.txt.tmp" "2.txt"
del "1.txt.tmp"

endlocal


penpen

axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Re: Search in 1.txt for missing entrys from 2.txt

#10 Post by axi92 » 06 Jun 2016 05:28

Perfect thats the solution =) ty

EDIT:
After some testing the script puts always the last line into the output.txt no matter what it is^^?

Compo
Posts: 600
Joined: 21 Mar 2014 08:50

Re: Search in 1.txt for missing entrys from 2.txt

#11 Post by Compo » 06 Jun 2016 05:42

What is wrong with this then?

Code: Select all

@PushD %~dp0
@For /F %%a In (2.txt) Do @FindStr/R "^%%a\>" 1.txt>Nul 2>&1||Echo(%%a>>output.txt

if you want a case insensitive search change it to:

Code: Select all

@PushD %~dp0
@For /F %%a In (2.txt) Do @FindStr/IR "^%%a\>" 1.txt>Nul 2>&1||Echo(%%a>>output.txt
Just make sure you don't have an existing output.txt

axi92
Posts: 6
Joined: 02 Jun 2016 04:22

Re: Search in 1.txt for missing entrys from 2.txt

#12 Post by axi92 » 06 Jun 2016 22:41

Ty second code is the right one =)

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Search in 1.txt for missing entrys from 2.txt

#13 Post by penpen » 07 Jun 2016 02:09

axi92 wrote:After some testing the script puts always the last line into the output.txt no matter what it is^^?
If it is possible you should end the last line (in "2.txt") with a carriage return and a newline (like all other lines).
While these characters are not present the line end doesn't match.

Or you could add a copy with that end of line:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion

> "1.txt.tmp" (
   set "line="
   for /F %%a in ('findstr /B /I /G:"2.txt" "1.txt" ^| sort') do if not "%%~a" == "!line!" (
      set "line=%%~a"
      echo(%%~a
   )
)

copy "2.txt" "2.txt.tmp"
> "2.txt.tmp" echo(

> "output.txt" FindStr /V /X /I /G:"1.txt.tmp" "2.txt.tmp"
del "1.txt.tmp" "2.txt.tmp"

endlocal


Compo wrote:What is wrong with this then?
If a name contains a metacharacter of findstr regualar expression, then it could generate false positive on the findstr search:
For example if "1.txt" contains the name "upside", and "2.txt" contains "ups.de", then findstr finds "upside" and "ups.de" is not listed as missing.


penpen

Compo
Posts: 600
Joined: 21 Mar 2014 08:50

Re: Search in 1.txt for missing entrys from 2.txt

#14 Post by Compo » 07 Jun 2016 08:50

penpen wrote:
Compo wrote:What is wrong with this then?
If a name contains a metacharacter of findstr regualar expression, then it could generate false positive on the findstr search:
For example if "1.txt" contains the name "upside", and "2.txt" contains "ups.de", then findstr finds "upside" and "ups.de" is not listed as missing.


penpen

Other than a few entries using question marks at the end, there were no entries using problem metacharacters and therefore no need to cater for that possibility. (Your observation may be of use to the OP should they be aware of something that was not properly catered for in their sample data).

Post Reply