Page 1 of 1

How to findstr strings that are German?

Posted: 19 Jun 2014 22:34
by pditty8811
I'm trying to findstr or compare two strings in batch which will contain German words, like: Seebrügge
(notice the accent)

How do I do this? I know how to write batch, but batch isn't cooperating me with this. I'm searching .txt files for words like Seebrügge. Or comparing variables which may contain German words.

Any idea?

Re: How to findstr strings that are German?

Posted: 19 Jun 2014 23:45
by foxidrive
You have to use a German command processor, or an English one with a translator.
On the other hand, a Babel Fish routine could be written to handle this...

If I have misunderstood your needs, then please provide better details...

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 00:28
by pditty8811
Thanks foxi for your quick reply. I suppose a Babel routine would work. I would need to translate it from German to English, then back when I am finished. Know where I can find such a function, that I can call?

Or how do we go about this?

Say I have a text file with several German city names within it. Like so

Code: Select all

Saarbrücken
Nürnberg
Fürth
Wipperfürth
Tübingen
Tübingen
München
Kitzbühel
Märkisch-Buchholz
Fünfkirchen


And I'd like to findstr on the file looking for strings like Fürth to see if it is there. How would I go about doing that?

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 00:43
by foxidrive
What command have you tried, and how did it fail when looking for Fürth ?

The people here who use non-latin character sets will know more than me, but with some characters the code page has to be set with CHCP before you used the special characters.

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 01:16
by pditty8811

Code: Select all

findstr /c:
Does not work.

However, this does work:

Code: Select all

for /f "tokens=*" %%f in (test.txt) do (
   set var1=%%f
   echo !var1!
   if "!var1!"=="Brunsbüttel" (
      echo FOUND Brunsbüttel!!
   )else (
   echo NOT FOUND AGAIN
   )
)


But I fear it will take longer processing time than findstr. Is that true? Does for loop take longer?

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 02:04
by miskox
We also have characters that are not in the English alphabet. All you have to do is:
- convert the original file from the CP you have in Windows (I have CP1250) to your DOS CP (I have CP852)
- then you should be able to enter the letter Brunsbüttel in the DOS window (because it is in your DOS codepage) and perform a search

Another option is to use a dot "." for a wildcard character in FINDSTR if this really cannot find another match:

Brunsb.ttel (see regular expressions)

Saso

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 03:22
by pditty8811
I think I'll stay with the for loops. It's working so that's fine.

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 07:51
by dbenham
The reason FINDSTR is not working is because FINDSTR fails to work with many extened ASCII characters (hex code > 127) if they are used on the command line. If an extended ASCII character such as the "ü" (dependent on your code page) is used on the command line in a search string, then FINDSTR treats it as a normal ASCII character. In this case I believe it matches "u". More information is available at http://stackoverflow.com/questions/8844 ... 73#8844873.

The way to work around the problem using FINDSTR is to write the search string to a file and use the /g:searchFile option.

Or you can get yourself a free port of the grep utility for Windows from gnu.

Or you can use the FOR loops, but they are relatively slow if working with large files.


Dave Benham

Re: How to findstr strings that are German?

Posted: 20 Jun 2014 13:43
by miskox
dbenham wrote:The reason FINDSTR is not working is because FINDSTR fails to work with many extened ASCII characters (hex code > 127) if they are used on the command line. If an extended ASCII character such as the "ü" (dependent on your code page) is used on the command line in a search string, then FINDSTR treats it as a normal ASCII character. In this case I believe it matches "u". More information is available at http://stackoverflow.com/questions/8844 ... 73#8844873.

The way to work around the problem using FINDSTR is to write the search string to a file and use the /g:searchFile option.

Or you can get yourself a free port of the grep utility for Windows from gnu.

Or you can use the FOR loops, but they are relatively slow if working with large files.


Dave Benham


My experience is different. Try this code:

Code: Select all

@echo off

chcp

if exist 850.txt  del  850.txt
if exist 1250.txt del 1250.txt

for %%b in (
"4D53434600000000AB000000000000002C00000000000000030101000100000000000000440000000100010073000000"
"000000000000D4440CAA20003835302E74787400437C85145F007300434B0B4E4C2C4A2A6A4CCE4ECDE3E5F26B2CCA4B"
"4A2D4AE7E5726B2C2AC9E0E50ACF2C28482D4A8370421A9332F3D241EA102CDFC6BCE40C10C33BB3A42AA93123350728"
"F6A4283BB3383943D7A9343923233FA70A645C5E5A766611442900") Do >>850.txt (Echo.For b=1 To len^(%%b^) Step 2
Echo WScript.StdOut.Write Chr^(CByte^("&H"^&Mid^(%%b,b,2^)^)^) : Next)
Cscript /b /e:vbs 850.txt > 850.tx_
Expand -r 850.tx_ 850.txt>nul 2>&1
if exist 850.tx_ del 850.tx_

for %%b in (
"4D53434600000000AD000000000000002C00000000000000030101000100000000000000450000000100010073000000"
"000000000000D444E5A92000313235302E74787400CB1BA91260007300434B0B4E4C2C4A2AFA939C9D9AC7CBE5F7A728"
"2F29B5289D97CBED4F5149062F57786641416A511A8413F22729332F1DA40EC1F2FD93979C01627867965425FDC948CD"
"018A3D29CACE2C4ECED0752A4DCEC8C8CFA902199797969D5904510A00") Do >>1250.txt (Echo.For b=1 To len^(%%b^) Step 2
Echo WScript.StdOut.Write Chr^(CByte^("&H"^&Mid^(%%b,b,2^)^)^) : Next)
Cscript /b /e:vbs 1250.txt > 1250.tx_
Expand -r 1250.tx_ 1250.txt>nul 2>&1
if exist 1250.tx_ del 1250.tx_


echo Files 850.txt and 1250.txt created

findstr /I /C:"%1" *.txt

echo And now find CP1250 character:

findstr /I /C:"Saarbrücken" *.txt



File 1250.txt has CP1250 (Windows ANSI) and 850.txt has 'ü' replaced by a CP850 value of 0x81 (129 decimal) CP852 value for letter 'ü'.

Start the test.cmd from the DOS prompt with test.cmd Saarbrücken or with

test.cmd SaarbrÜcken

Output:

Code: Select all

c:\>test.cmd Saarbrücken
Active code page: 852
Files 850.txt and 1250.txt created
850.txt:Saarbrücken
And now find CP1250 character:
1250.txt:SaarbrŘcken

c:\>



As we can see it works on CP852 (I think Germany has CP850 in DOS Window).
There is only one match for each FINDSTR command.
Maybe you should change the code page if it does not work for you.

Saso

Re: How to findstr strings that are German?

Posted: 24 Jun 2014 04:48
by trebor68
The following code need a parameter (name of city) to check this in in file. Don't use quotes in the parameter.
BATCH Saarbrücken
BATCH Frankfurt (Main)

Code: Select all

@echo off
setlocal EnableExtensions
REM  in Windows found the German characters in CP1252 and CP1250 at the same position
REM  check: input Alt+0165 in your editor, when you see the Yen sign is this CP1252
REM  ECHO "ä"  "ö"  "ü"  "Ä"  "Ö"  "Ü"  "ß"  CP1252
REM  ECHO "„"  "”"  ""  "Ž"  "™"  "š"  "á"  CP850
REM  ECHO "ae" "oe" "ue" "Ae" "Oe" "Ue" "ss"  alternate input


if "%1" EQU "" (echo no parameter) & goto :eof
set sstr=%*
set wstr=%*

rem convert WSTR <==> word string (input / output to file) - Alt+xxx (CP850) = Windows charcter (CP1252)
rem rows sort: ae, oe, ue, Ae, Oe, Ue, ss
set wstr=%wstr:„=ä%
set wstr=%wstr:”=ö%
set wstr=%wstr:=ü%
set wstr=%wstr:Ž=Ä%
set wstr=%wstr:™=Ö%
set wstr=%wstr:š=Ü%
set wstr=%wstr:á=ß%

rem convert SSTR <==> search string (regular expression) - Alt+xxx (CP850) = Windows charcter (CP1252)
rem character Ae, Ue, ss and space convert to "." (dot)
rem rows sort: ae, oe, ue, Ae, Oe, Ue, ss, space
set sstr=%sstr:„=ä%
set sstr=%sstr:”=ö%
set sstr=%sstr:=ü%
set sstr=%sstr:Ž=.%
set sstr=%sstr:™=Ö%
set sstr=%sstr:š=.%
set sstr=%sstr:á=.%
set sstr=%sstr: =.%

echo "%wstr%"  --  "%sstr%"
echo.

set found=NO
for /f "tokens=*" %%a in ('findstr /i /r "%sstr%" UmlauteStadt.txt') do if "%%a" EQU "%wstr%" set found=YES
if %found% EQU NO echo %wstr%>>UmlauteStadt.txt


In the 5th row are the characters from CP1252 from left to right:
0xE4 0xF6 0xFC 0xC4 0xD6 0xDC 0xDF
Alt+0228 Alt+0246 Alt+0252 Alt+0196 Alt+0214 Alt+0220 Alt+0223 - input in the file

The charcters in the 6th row in the same order, but characters from CP850 that are writing from the promt to the file.
Alt+0132 Alt+0148 Alt+0129 Alt+0142 Alt+0142 Alt+0154 Alt+0225 - input in the file; Alt+0129 not possible

Input from prompt: ECHO "ä" "ö" "ü" "Ä" "Ö" "Ü" "ß" CP850 >>file
In the file you will find all seven charcters between the quotes.

In the 7th row is a alternate from this characters, when is not possible to write the correct charcters.


To convert the WSTR: set wstr=%wstr:=ä%

To convert the SSTR the same commands but not for all characters. There are four characters that change to "." (dot).
The last character in this block to converts is the space character.


FINDSTR can not used the graphical characters like lines and blocks.
The space character seperated the search string in more search strings.
Example: search "Bad Kleinen" -- find: "Bad Salzungen" or "Baden-Baden"

The "." (dot) ist representing one character in the regular expression.

Re: How to findstr strings that are German?

Posted: 30 Jun 2014 03:18
by trebor68
This code creates a batch file that shows the differences with the German letters between CP1252 and CP850.

Hint: Microsoft Windows OS versions newer than XP contain certutil.exe. Older OS versions may be able to install certutil.exe as part of another package, e.g. the Windows 2003 Server Service Pack 1 version of adminpak.

Code: Select all

@echo off
set file=%~n0_tmp
if exist %file%.bat del %file%.bat

(
echo 406563686F206F66660D0A4543484F2022E422202022F622202022FC22202022
echo C422202022D622202022DC22202022DF222020204350313235320D0A4543484F
echo 20228422202022942220202281222020228E2220202299222020229A22202022
echo E12220202043503835300D0A4543484F202261652220226F6522202275652220
echo 2241652220224F6522202255652220227373222020616C7465726E6174650D0A
) >%file%.hex

CertUtil -decodehex %file%.hex %file%.bat
echo.
type %file%.bat
echo.
dir %file%*.*


Open the batch file in an editor.
There is always only one character between the quotation marks, but in the last row are two characters between the quotation marks.