Page 1 of 1

Search in 1.txt for missing entrys from 2.txt

Posted: 02 Jun 2016 05:19
by axi92
I have 2 .txt files and the lines are like this:

1.txt

Code: Select all

ezekiel7eu89au4 
drone5sg25ez6
kureze3ft26jc6          5 R3, 5 X3, S, T
ni7up28fu6             5 R4, 5 X4, S, LA
drone9rc88jy5          S, Multi-hack, 5 X4, 5 R4


2.txt

Code: Select all

ada3zc36qq9
blub
ada9yv83mp5
algorithm9gh35cj3
bletchley9ob65ca4


So i want to find that word in 2.txt that is NOT! in the 1.txt.

Tried that but did not work:

Code: Select all

findstr /V /B /N /g:stamm.txt /f:suchindex.txt >> output.txt

Output is that:

Code: Select all


...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt

...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt

...\Neuer Ordner>findstr /V /B /N /g:stamm.txt /f:suchindex.txt  1>>output.txt
........

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 02 Jun 2016 17:44
by penpen
If the file "1.txt" is not too big and no poison characters are possible ('%', '!', ...),
then you could create a single findstr expression:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
set "search=findstr /B /E /L /V"
   for /F "usebackq tokens=* delims=" %%a in ("1.txt") do (
      set "search=!search! /C:"%%a""
   )
)
%search% "2.txt"
endlocal


penpen

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 02 Jun 2016 23:21
by axi92
Ty for the code but when i start the batch script the window stays black...?
Is it possible to export the missing lines from 2.txt that were not found in the 1.txt into output.txt?
So i can copy them easily into the 1.txt.

i renamed 1.txt to stamm.txt and 2.txt to suchindex.txt

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
set "search=findstr /B /E /L /V"
   for /F "usebackq tokens=* delims=" %%a in ("stamm.txt") do (
      set "search=!search! /C:"%%a""
   )
)
%search% "suchindex.txt"
endlocal
PAUSE



And the var %search% is at the and this:

Code: Select all

findstr /B /E /L /V /C:"johnson3ba26qb2" /C:"jackland8vf92qz5 " /C:"ezekiel7eu89au4 " /C:"drone5sg25ez6 " /C:"kureze3ft26jc6          5 R3, 5 X3, S, T" /C:"ni7up28fu6             5 R4, 5 X4, S, LA" /C:"drone9rc88jy5          S, Multi-hack, 5 X4, 5 R4" /C:"glyph7jb25yw3          S, LA, 5 X2, 5 R2" /C:"ezekiel3xh34ug4       S, 5 X1, 5 R1, FA" /C:"susanna3ku75cm9       US3, Multi-hack, 5 X3, 5 R3" /C:"ingress3nd85fu9       S, 5 X3, 5 R3, Heat Sink" /C:"blue2xc26da2         US3, T, 5 X3, 5 R3" /C:"ada9yv83mp5          US3, Multi-hack, 5 X3, 5 R3" /C:"vi8zu85il7             50 XM, 50 AP, 4 X1, 4 R1" /C:"vi9rp62ex1            50 XM, 50 AP, 4 X1, 4 R1" /C:"lightman4tm34zf3      US4, 5 X4, 5 R4, Heat Sink" /C:"algorithm9ek27ux3      S, T, 5 X3, 5 R3" /C:"glyph6yt84kt8         S, LA, 5 X4, 5 R4" /C:"johnson4yn13db2         US3, FA, S, T, 5 X4, 5 R4   " /C:"eVoLvE7Yo65nm9         US3, LA, 5 X3, 5 R3" /C:"phillips6wc29mc7      S, 5 X4, 5 R4, Heat Sink" /C:"blue3dg99cm6         US3, T, 5 X3, 5 R3" /C:"hubert6db54fa6         US4, 5 X4, 5 R4, FA" /C:"green7dv85mp8         S, LA, 5 X2, 5 R2" /C:"spacetime7ap46rr6      US2, 5 X2, 5 R2, FA" /C:"CASSANDRA2YU35CP6      US1, T, 5 X1, 5 R1" /C:"resonate3yd72he7      US1, 5 X1, 5 R1, FA" /C:"resonate6wb48ec4      US4, 5 X4, 5 R4, FA" /C:"johnson3fx84aw9         US3, LA, 5 X3, 5 R3" /C:"lightman8nd48zb2      US2, 5 X2, 5 R2, Heat Sink" /C:"message5ka73rp4         S, Multi-hack, 5 X4, 5 R4" /C:"devra2gt69qx7         US4, 5 X4, 5 R4, Heat Sink" /C:"powercube5yn73em6      S, 5 X1, 5 R1, FA" /C:"roland7br76tp5         S, 5 X1, 5 R1, FA" /C:"lvboynxaie            1000 XM, 10 C1, 2 Multi-hack, 10 R1, 2 Heat Sink" /C:"vi9uh67mo6            50 XM, 50 AP, 4 X1, 4 R1" /C:"minotaur8bb28et5      T, US2, 5 X2, 5 R2" /C:"message6ca48vf7         S, Multi-hack, 5 X1, 5 R1" /C:"cube8MK95JJ7         US4, 5 X4, 5 R4, Heat Sink" /C:"kjwbwy46yw            15 C2, 10 US3, 20 R1" /C:"jackland4dz47yf6      US2, Multi-hack, 5 X2, 5 R2" /C:"evolution2gu76gm3      S, 5 X1, 5 R1, Heat Sink" /C:"creativity4pb44pf6      S, 5 X2, 5 R2, Heat Sink" /C:"alignment9nb75yo5      S, 5 X3, 5 R3, Heat Sink" /C:"artifact3ne73hh3      5 R1, 5 X1, S, FA" /C:"artifact4tt67xg9      s, 5 X1, 5 R1, FA" /C:"symbols4ye57bs7         5 R3, 5 X3, US3, MH" /C:"roland8cx62mk4" /C:"jarvis2kn66cz2         US3, T, 5 X3, 5 R3" /C:"inveniri2he69ar3      5 R2, 5 X2, US2, FA" /C:"inveniri2hc78yy4      5 Rs, 5 X1, US1, FA" /C:"hulong7tr85ub6" /C:"hubert4su42qt2         " /C:"field4mo46jx6         S, LA, 5 X3, 5 R3" /C:"farlowe2ft72ym5         S, T, 5 X2, 5 R2" /C:"deaddrop7dt73am6      S, T, 5 X4, 5 R4" /C:"deaddrop6bf98mr2      S, T, 5 X3, 5 R3" /C:"cube8mk95jj7         US4, 5 X4, 5 R4, HS" /C:"evolution6xu68ru7      5 R1, 5 X1, S, HS" /C:"creativity2pc98zp5      S, 5 X2, 5 R2, HS" /C:"creative3vk97yv4      US3, 5 X3, 5 R3, Heat Sink" /C:"susanna7og34vw3         US4, Multi-hack, 5 X4, 5 R4" /C:"vi2jo15nd0            50 AP, 50 XM, 4 R1, 4 X1" /C:"timezero2qm72ut8      S, LA, 5 X1, 5 R1" /C:"80JDFITMAR            5 C8, 5 US8, 10 X8, 10 R8, 5 S            x" /C:"alignment3qh24up8      S, 5 X1, 5 R1, HS                     x" /C:"glyphs4gt83bm4?         US1, LA, 5 X1, 5 R1                     x" /C:"3OUV7WU5A4" /C:"spacetime4je35kf5      US4, 5 X4, 5 R4, FA                     x" /C:"cassandra3wh77rg4      T, US2, 5 X2, 5 R2                     x" /C:"cern2vb46cn7         S, Multi-hack, 5 X1, 5 R1               x" /C:"vi9bb02fk7            50 AP, 50 XM, 4 R1, 4 X1" /C:"ingress9tu32jk7         5 4R, 5 4X, S, HS" /C:"80jdfitmar?            5 C8, 5 US8, 10 X8, 10 R8, 5 S" /C:"Fully redeemed:" /C:"algorithm9gh35cj3" /C:"bletchley9ob65ca4" /C:"conflict5av38pw2" /C:"field5jk36yh6" /C:"portal7cc88cd2" /C:"ada3zc36qq9" /C:"cern5wu99oq2" /C:"chaotic5gg23pf9" /C:"cube8aa87xd2" /C:"timezero2kk78gx5      " /C:"moyer5pp56fg2" /C:"tycho7vu99ta2" /C:"niantic9ns77ww9" /C:"green3ou25jt4" /C:"evolve5uu33zd4" /C:"jarvis5ye63mv9" /C:"glyphs6gj75yq2" /C:"kureze2sg38gt2" /C:"Need Testing:" /C:"minotaur8dm83gg5" /C:"moyer4wr38qz8" /C:"niantic4rv29wc6" /C:"powercube3hu72ut7" /C:"tycho9uo99qa2" /C:"voynich6sx52zr5" /C:"wolfe7jq38cj3" /C:"4apz5symbolsx9u9b" /C:"creative3nc46wp7" /C:"Cube8aa87xd2?" 

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 03 Jun 2016 06:44
by Squashman
So basically what you are trying to do is merge two files together without any duplicate lines?

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 03 Jun 2016 17:38
by penpen
axi92 wrote:when i start the batch script the window stays black...?
Is it possible to export the missing lines from 2.txt that were not found in the 1.txt into output.txt?
The above script is doing this exactly, and works with the above test data (== all lines of 2.txt are found)!
If the window stays black, then 3 things (or more?) may happen:
- there is no such missing line in 2.txt,
- the script may take longer than you expect, or
- somehow a findstr bug is triggered... (but i doubt this).
Maybe you could link some real world data, and post what you expect to see, so i could check the results.

The content of your tested file ("Stamm.txt") seems to be short enough (resulting command size in characters is ~ 4KB < 8KB),
but maybe you have additional files that may have problematic lengths, so the above script may fail on such files.

Maybe one could use another algorithm, but then some more details were helpfull:
- Which characters are involved 0x32-0x7F (hexadecimal values), or more (-> set/P or for/F)?
- How long are the maximum sizes of the files "1.txt" and "2.txt"?


penpen

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 05 Jun 2016 22:42
by axi92
Filesize is not the matter the maximum filesize is about 80 lines^^
The 1.txt is about 3,26kb and 2.txt 949 bytes

I don't want to post my codes here because i dont want google to index these lines...
Here is the dropbox link: https://www.dropbox.com/s/82blon9i91svnp7/test.zip?dl=0

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 01:50
by Compo
I get an output.txt of 34 lines using your supplied data and this:

Code: Select all

FindStr/VXG:"1.txt" "2.txt">output.txt

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 02:29
by axi92
It works better than the codes before but the word:

Code: Select all

spacetime7ap46rr6

is marked as missing, I think because of the additional information that is in the 1.txt (" US2, 5 X2, 5 R2, FA")

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 03:07
by penpen
The above does not work, because findstr seems to have a limit of 100 "/c:"-search strings and the sample file "1.txt" contains 104 lines:
I never noticed this limitation in findstr before.

So this may help you:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion

> "1.txt.tmp" (
   set "line="
   for /F %%a in ('findstr /B /I /G:"2.txt" "1.txt" ^| sort') do if not "%%~a" == "!line!" (
      set "line=%%~a"
      echo(%%~a
   )
)
> "output.txt" FindStr /V /X /I /G:"1.txt.tmp" "2.txt"
del "1.txt.tmp"

endlocal


penpen

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 05:28
by axi92
Perfect thats the solution =) ty

EDIT:
After some testing the script puts always the last line into the output.txt no matter what it is^^?

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 05:42
by Compo
What is wrong with this then?

Code: Select all

@PushD %~dp0
@For /F %%a In (2.txt) Do @FindStr/R "^%%a\>" 1.txt>Nul 2>&1||Echo(%%a>>output.txt

if you want a case insensitive search change it to:

Code: Select all

@PushD %~dp0
@For /F %%a In (2.txt) Do @FindStr/IR "^%%a\>" 1.txt>Nul 2>&1||Echo(%%a>>output.txt
Just make sure you don't have an existing output.txt

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 06 Jun 2016 22:41
by axi92
Ty second code is the right one =)

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 07 Jun 2016 02:09
by penpen
axi92 wrote:After some testing the script puts always the last line into the output.txt no matter what it is^^?
If it is possible you should end the last line (in "2.txt") with a carriage return and a newline (like all other lines).
While these characters are not present the line end doesn't match.

Or you could add a copy with that end of line:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion

> "1.txt.tmp" (
   set "line="
   for /F %%a in ('findstr /B /I /G:"2.txt" "1.txt" ^| sort') do if not "%%~a" == "!line!" (
      set "line=%%~a"
      echo(%%~a
   )
)

copy "2.txt" "2.txt.tmp"
> "2.txt.tmp" echo(

> "output.txt" FindStr /V /X /I /G:"1.txt.tmp" "2.txt.tmp"
del "1.txt.tmp" "2.txt.tmp"

endlocal


Compo wrote:What is wrong with this then?
If a name contains a metacharacter of findstr regualar expression, then it could generate false positive on the findstr search:
For example if "1.txt" contains the name "upside", and "2.txt" contains "ups.de", then findstr finds "upside" and "ups.de" is not listed as missing.


penpen

Re: Search in 1.txt for missing entrys from 2.txt

Posted: 07 Jun 2016 08:50
by Compo
penpen wrote:
Compo wrote:What is wrong with this then?
If a name contains a metacharacter of findstr regualar expression, then it could generate false positive on the findstr search:
For example if "1.txt" contains the name "upside", and "2.txt" contains "ups.de", then findstr finds "upside" and "ups.de" is not listed as missing.


penpen

Other than a few entries using question marks at the end, there were no entries using problem metacharacters and therefore no need to cater for that possibility. (Your observation may be of use to the OP should they be aware of something that was not properly catered for in their sample data).