How to findstr strings that are German?
Moderator: DosItHelp
-
- Posts: 184
- Joined: 21 Feb 2013 15:54
How to findstr strings that are German?
I'm trying to findstr or compare two strings in batch which will contain German words, like: Seebrügge
(notice the accent)
How do I do this? I know how to write batch, but batch isn't cooperating me with this. I'm searching .txt files for words like Seebrügge. Or comparing variables which may contain German words.
Any idea?
(notice the accent)
How do I do this? I know how to write batch, but batch isn't cooperating me with this. I'm searching .txt files for words like Seebrügge. Or comparing variables which may contain German words.
Any idea?
Re: How to findstr strings that are German?
You have to use a German command processor, or an English one with a translator.
On the other hand, a Babel Fish routine could be written to handle this...
If I have misunderstood your needs, then please provide better details...
On the other hand, a Babel Fish routine could be written to handle this...
If I have misunderstood your needs, then please provide better details...
-
- Posts: 184
- Joined: 21 Feb 2013 15:54
Re: How to findstr strings that are German?
Thanks foxi for your quick reply. I suppose a Babel routine would work. I would need to translate it from German to English, then back when I am finished. Know where I can find such a function, that I can call?
Or how do we go about this?
Say I have a text file with several German city names within it. Like so
And I'd like to findstr on the file looking for strings like Fürth to see if it is there. How would I go about doing that?
Or how do we go about this?
Say I have a text file with several German city names within it. Like so
Code: Select all
Saarbrücken
Nürnberg
Fürth
Wipperfürth
Tübingen
Tübingen
München
Kitzbühel
Märkisch-Buchholz
Fünfkirchen
And I'd like to findstr on the file looking for strings like Fürth to see if it is there. How would I go about doing that?
Re: How to findstr strings that are German?
What command have you tried, and how did it fail when looking for Fürth ?
The people here who use non-latin character sets will know more than me, but with some characters the code page has to be set with CHCP before you used the special characters.
The people here who use non-latin character sets will know more than me, but with some characters the code page has to be set with CHCP before you used the special characters.
-
- Posts: 184
- Joined: 21 Feb 2013 15:54
Re: How to findstr strings that are German?
Code: Select all
findstr /c:
However, this does work:
Code: Select all
for /f "tokens=*" %%f in (test.txt) do (
set var1=%%f
echo !var1!
if "!var1!"=="Brunsbüttel" (
echo FOUND Brunsbüttel!!
)else (
echo NOT FOUND AGAIN
)
)
But I fear it will take longer processing time than findstr. Is that true? Does for loop take longer?
Re: How to findstr strings that are German?
We also have characters that are not in the English alphabet. All you have to do is:
- convert the original file from the CP you have in Windows (I have CP1250) to your DOS CP (I have CP852)
- then you should be able to enter the letter Brunsbüttel in the DOS window (because it is in your DOS codepage) and perform a search
Another option is to use a dot "." for a wildcard character in FINDSTR if this really cannot find another match:
Brunsb.ttel (see regular expressions)
Saso
- convert the original file from the CP you have in Windows (I have CP1250) to your DOS CP (I have CP852)
- then you should be able to enter the letter Brunsbüttel in the DOS window (because it is in your DOS codepage) and perform a search
Another option is to use a dot "." for a wildcard character in FINDSTR if this really cannot find another match:
Brunsb.ttel (see regular expressions)
Saso
-
- Posts: 184
- Joined: 21 Feb 2013 15:54
Re: How to findstr strings that are German?
I think I'll stay with the for loops. It's working so that's fine.
Re: How to findstr strings that are German?
The reason FINDSTR is not working is because FINDSTR fails to work with many extened ASCII characters (hex code > 127) if they are used on the command line. If an extended ASCII character such as the "ü" (dependent on your code page) is used on the command line in a search string, then FINDSTR treats it as a normal ASCII character. In this case I believe it matches "u". More information is available at http://stackoverflow.com/questions/8844 ... 73#8844873.
The way to work around the problem using FINDSTR is to write the search string to a file and use the /g:searchFile option.
Or you can get yourself a free port of the grep utility for Windows from gnu.
Or you can use the FOR loops, but they are relatively slow if working with large files.
Dave Benham
The way to work around the problem using FINDSTR is to write the search string to a file and use the /g:searchFile option.
Or you can get yourself a free port of the grep utility for Windows from gnu.
Or you can use the FOR loops, but they are relatively slow if working with large files.
Dave Benham
Re: How to findstr strings that are German?
dbenham wrote:The reason FINDSTR is not working is because FINDSTR fails to work with many extened ASCII characters (hex code > 127) if they are used on the command line. If an extended ASCII character such as the "ü" (dependent on your code page) is used on the command line in a search string, then FINDSTR treats it as a normal ASCII character. In this case I believe it matches "u". More information is available at http://stackoverflow.com/questions/8844 ... 73#8844873.
The way to work around the problem using FINDSTR is to write the search string to a file and use the /g:searchFile option.
Or you can get yourself a free port of the grep utility for Windows from gnu.
Or you can use the FOR loops, but they are relatively slow if working with large files.
Dave Benham
My experience is different. Try this code:
Code: Select all
@echo off
chcp
if exist 850.txt del 850.txt
if exist 1250.txt del 1250.txt
for %%b in (
"4D53434600000000AB000000000000002C00000000000000030101000100000000000000440000000100010073000000"
"000000000000D4440CAA20003835302E74787400437C85145F007300434B0B4E4C2C4A2A6A4CCE4ECDE3E5F26B2CCA4B"
"4A2D4AE7E5726B2C2AC9E0E50ACF2C28482D4A8370421A9332F3D241EA102CDFC6BCE40C10C33BB3A42AA93123350728"
"F6A4283BB3383943D7A9343923233FA70A645C5E5A766611442900") Do >>850.txt (Echo.For b=1 To len^(%%b^) Step 2
Echo WScript.StdOut.Write Chr^(CByte^("&H"^&Mid^(%%b,b,2^)^)^) : Next)
Cscript /b /e:vbs 850.txt > 850.tx_
Expand -r 850.tx_ 850.txt>nul 2>&1
if exist 850.tx_ del 850.tx_
for %%b in (
"4D53434600000000AD000000000000002C00000000000000030101000100000000000000450000000100010073000000"
"000000000000D444E5A92000313235302E74787400CB1BA91260007300434B0B4E4C2C4A2AFA939C9D9AC7CBE5F7A728"
"2F29B5289D97CBED4F5149062F57786641416A511A8413F22729332F1DA40EC1F2FD93979C01627867965425FDC948CD"
"018A3D29CACE2C4ECED0752A4DCEC8C8CFA902199797969D5904510A00") Do >>1250.txt (Echo.For b=1 To len^(%%b^) Step 2
Echo WScript.StdOut.Write Chr^(CByte^("&H"^&Mid^(%%b,b,2^)^)^) : Next)
Cscript /b /e:vbs 1250.txt > 1250.tx_
Expand -r 1250.tx_ 1250.txt>nul 2>&1
if exist 1250.tx_ del 1250.tx_
echo Files 850.txt and 1250.txt created
findstr /I /C:"%1" *.txt
echo And now find CP1250 character:
findstr /I /C:"Saarbrücken" *.txt
File 1250.txt has CP1250 (Windows ANSI) and 850.txt has 'ü' replaced by a CP850 value of 0x81 (129 decimal) CP852 value for letter 'ü'.
Start the test.cmd from the DOS prompt with test.cmd Saarbrücken or with
test.cmd SaarbrÜcken
Output:
Code: Select all
c:\>test.cmd Saarbrücken
Active code page: 852
Files 850.txt and 1250.txt created
850.txt:Saarbrücken
And now find CP1250 character:
1250.txt:SaarbrŘcken
c:\>
As we can see it works on CP852 (I think Germany has CP850 in DOS Window).
There is only one match for each FINDSTR command.
Maybe you should change the code page if it does not work for you.
Saso
Re: How to findstr strings that are German?
The following code need a parameter (name of city) to check this in in file. Don't use quotes in the parameter.
BATCH Saarbrücken
BATCH Frankfurt (Main)
In the 5th row are the characters from CP1252 from left to right:
0xE4 0xF6 0xFC 0xC4 0xD6 0xDC 0xDF
Alt+0228 Alt+0246 Alt+0252 Alt+0196 Alt+0214 Alt+0220 Alt+0223 - input in the file
The charcters in the 6th row in the same order, but characters from CP850 that are writing from the promt to the file.
Alt+0132 Alt+0148 Alt+0129 Alt+0142 Alt+0142 Alt+0154 Alt+0225 - input in the file; Alt+0129 not possible
Input from prompt: ECHO "ä" "ö" "ü" "Ä" "Ö" "Ü" "ß" CP850 >>file
In the file you will find all seven charcters between the quotes.
In the 7th row is a alternate from this characters, when is not possible to write the correct charcters.
To convert the WSTR: set wstr=%wstr:„=ä%
To convert the SSTR the same commands but not for all characters. There are four characters that change to "." (dot).
The last character in this block to converts is the space character.
FINDSTR can not used the graphical characters like lines and blocks.
The space character seperated the search string in more search strings.
Example: search "Bad Kleinen" -- find: "Bad Salzungen" or "Baden-Baden"
The "." (dot) ist representing one character in the regular expression.
BATCH Saarbrücken
BATCH Frankfurt (Main)
Code: Select all
@echo off
setlocal EnableExtensions
REM in Windows found the German characters in CP1252 and CP1250 at the same position
REM check: input Alt+0165 in your editor, when you see the Yen sign is this CP1252
REM ECHO "ä" "ö" "ü" "Ä" "Ö" "Ü" "ß" CP1252
REM ECHO "„" "”" "" "Ž" "™" "š" "á" CP850
REM ECHO "ae" "oe" "ue" "Ae" "Oe" "Ue" "ss" alternate input
if "%1" EQU "" (echo no parameter) & goto :eof
set sstr=%*
set wstr=%*
rem convert WSTR <==> word string (input / output to file) - Alt+xxx (CP850) = Windows charcter (CP1252)
rem rows sort: ae, oe, ue, Ae, Oe, Ue, ss
set wstr=%wstr:„=ä%
set wstr=%wstr:”=ö%
set wstr=%wstr:=ü%
set wstr=%wstr:Ž=Ä%
set wstr=%wstr:™=Ö%
set wstr=%wstr:š=Ü%
set wstr=%wstr:á=ß%
rem convert SSTR <==> search string (regular expression) - Alt+xxx (CP850) = Windows charcter (CP1252)
rem character Ae, Ue, ss and space convert to "." (dot)
rem rows sort: ae, oe, ue, Ae, Oe, Ue, ss, space
set sstr=%sstr:„=ä%
set sstr=%sstr:”=ö%
set sstr=%sstr:=ü%
set sstr=%sstr:Ž=.%
set sstr=%sstr:™=Ö%
set sstr=%sstr:š=.%
set sstr=%sstr:á=.%
set sstr=%sstr: =.%
echo "%wstr%" -- "%sstr%"
echo.
set found=NO
for /f "tokens=*" %%a in ('findstr /i /r "%sstr%" UmlauteStadt.txt') do if "%%a" EQU "%wstr%" set found=YES
if %found% EQU NO echo %wstr%>>UmlauteStadt.txt
In the 5th row are the characters from CP1252 from left to right:
0xE4 0xF6 0xFC 0xC4 0xD6 0xDC 0xDF
Alt+0228 Alt+0246 Alt+0252 Alt+0196 Alt+0214 Alt+0220 Alt+0223 - input in the file
The charcters in the 6th row in the same order, but characters from CP850 that are writing from the promt to the file.
Alt+0132 Alt+0148 Alt+0129 Alt+0142 Alt+0142 Alt+0154 Alt+0225 - input in the file; Alt+0129 not possible
Input from prompt: ECHO "ä" "ö" "ü" "Ä" "Ö" "Ü" "ß" CP850 >>file
In the file you will find all seven charcters between the quotes.
In the 7th row is a alternate from this characters, when is not possible to write the correct charcters.
To convert the WSTR: set wstr=%wstr:„=ä%
To convert the SSTR the same commands but not for all characters. There are four characters that change to "." (dot).
The last character in this block to converts is the space character.
FINDSTR can not used the graphical characters like lines and blocks.
The space character seperated the search string in more search strings.
Example: search "Bad Kleinen" -- find: "Bad Salzungen" or "Baden-Baden"
The "." (dot) ist representing one character in the regular expression.
Re: How to findstr strings that are German?
This code creates a batch file that shows the differences with the German letters between CP1252 and CP850.
Hint: Microsoft Windows OS versions newer than XP contain certutil.exe. Older OS versions may be able to install certutil.exe as part of another package, e.g. the Windows 2003 Server Service Pack 1 version of adminpak.
Open the batch file in an editor.
There is always only one character between the quotation marks, but in the last row are two characters between the quotation marks.
Hint: Microsoft Windows OS versions newer than XP contain certutil.exe. Older OS versions may be able to install certutil.exe as part of another package, e.g. the Windows 2003 Server Service Pack 1 version of adminpak.
Code: Select all
@echo off
set file=%~n0_tmp
if exist %file%.bat del %file%.bat
(
echo 406563686F206F66660D0A4543484F2022E422202022F622202022FC22202022
echo C422202022D622202022DC22202022DF222020204350313235320D0A4543484F
echo 20228422202022942220202281222020228E2220202299222020229A22202022
echo E12220202043503835300D0A4543484F202261652220226F6522202275652220
echo 2241652220224F6522202255652220227373222020616C7465726E6174650D0A
) >%file%.hex
CertUtil -decodehex %file%.hex %file%.bat
echo.
type %file%.bat
echo.
dir %file%*.*
Open the batch file in an editor.
There is always only one character between the quotation marks, but in the last row are two characters between the quotation marks.