Detecting and removing TABs from a string without prior access to the TAB character

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
sst
Posts: 93
Joined: 12 Apr 2018 23:45

Detecting and removing TABs from a string without prior access to the TAB character

#1 Post by sst » 21 Nov 2020 13:49

Detecting and/or removing the TAB characters from a string is a fairly simple task, define the TAB character using one of many available methods and do a substitution !string:%TAB%=!
But I was looking a way to detect and possibly remove or substitute the TAB characters without resorting to any external command or using literal TAB in the editor.
I came up with a solution, and It was simpler than I could imagine. Actually the detection does not require access to TAB character. For removal or substitution, the TAB character can be extracted the from the subject string itself so it is a fully self contained solution.

Here is the sample script to demonstrate the method. It is fully commented and is fairly simple to understand.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "String=!%~1!"
if not defined String (
    echo Pass the name of the environment variable which contains one or more TAB characters
    exit /b
)
:: Define LineFeed
set ^"LF=^%==%
%==%
%==%"
:: And the escaped one
set "eLF=^^!LF!!LF!"

:: Remove all Spaces
:: And surround the string between two ordinary chars
:: to ensure that there will be no leading or trailing TABs
set "testTAB=#!String: =!#"
:: Remove all double quotation marks
set "testTAB=!testTAB:"=!"
:: Remove all LineFeeds
set ^"testTAB=!testTAB:%eLF%=!"
:: Remove all Bangs !
set "hasBang="
if not "!testTAB!"=="!testTAB:*!=!" (
    setlocal DisableDelayedExpansion
    set "hasBang=1"
    set "testTAB=%testTAB:!=%"
)
if defined hasBang (
    endlocal
    set "testTAB=%testTAB%"
)
:: Detecting the presence of TAB char in the string
set "hasTAB="
for /F "tokens=1,2" %%A in ("!testTAB!") do (
    if not "%%B"=="" (
        REM The default delimiters are <SPACE> and <TAB>
        REM Since the string doesn not contain any <SPACE> chars,
        REM A none empty second token proves the presence of the TAB char
        set "hasTAB=1"
        set "lead=%%A"
    )
)
if defined hasTAB (
    echo The String contains the TAB character
    call :strlen len lead
    REM Extract the TAB character from the String itself
    for %%I in (!len!) do set "charTAB=!testTAB:~%%I,1!"
)
if defined hasTAB (
    REM Now remove the TABs from the original string
    set "noTabString=!String:%charTAB%=!"
    echo Original String:     [!String!]
    echo After removing TABs: [!noTabString!]
)
exit /b

:strlen <resultVar> <stringVar>
(
    setlocal EnableDelayedExpansion
    set "s=!%~2!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!s:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "s=!s:~%%P!"
        )
    )
    for %%L in (!len!) do (
        endlocal
        set "%~1=%%L"
    )
    exit /b
)

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Detecting and removing TABs from a string without prior access to the TAB character

#2 Post by Squashman » 21 Nov 2020 18:19

As you said, this code only works if there are NO spaces in your string. But why go through all that extra code to remove the tab by getting the string length. You already have the TAB removed with the FOR /F command. And what if your string has multiple tabs???? Again the FOR /F would be more beneficial as you could just keep checking if the second token was not blank and use a GOTO to a label before the FOR /F command.

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#3 Post by sst » 22 Nov 2020 03:31

Squashman wrote:
21 Nov 2020 18:19
But why go through all that extra code to remove the tab by getting the string length. You already have the TAB removed with the FOR /F command. And what if your string has multiple tabs????
FOR /F can not be used to remove the TABs from the string because as you said the string can contain multiple none consecutive TABs. It can only be used to detect if the string contains any TABs or not.
Moreover one may not wish to remove the TABs, but substitute them with something else, say SPACES for example.
So for removal or substitution one needs to have access the TAB character. That is why I used strlen to determine the offset to first occurrence of the TAB character in the modified string, so it can be grabbed and further be used to apply the substitution to original string: !string:%TAB%=SomeThingElse!

So if one only needs to know if the string contains any TABs or not, and does not need any substitution, Then all they have to do is this:

Code: Select all

:: assuming the string does not contain any line feeds
for /F "tokens=2" %%A in ("#!string: =!#") do (
   REM The body of the loop executes only if there is at least one TAB in the string.
)
Squashman wrote:
21 Nov 2020 18:19
As you said, this code only works if there are NO spaces in your string.
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Detecting and removing TABs from a string without prior access to the TAB character

#4 Post by Squashman » 22 Nov 2020 10:23

sst wrote:
22 Nov 2020 03:31
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.
Well maybe I don't understand the English Language but you certainly inferred that with your REM comments.

And again you could use a FOR /F to replace the tabs by concatenating the first and second tokens together. And if you wanted to replace that tab with something in that same FOR /F command you could easily set a variable for that as well. If the variable is undefined then it will just concatenate the two tokens together. If it is defined the concatenation will replace the TAB. You can then loop back before the `FOR /F` again and check for another tab and keep replacing or removing using that same logic.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Detecting and removing TABs from a string without prior access to the TAB character

#5 Post by Squashman » 22 Nov 2020 10:43

Based on what I understand this may be an easier solution for you.

Code: Select all

@echo off
setlocal

SET "tabstg=Some	String	with	tabs"
SET /P "repl=Enter Replacement String or {enter} for none:"
:LOOP
FOR /F "tokens=1*" %%G IN ("%tabstg%") DO (
	IF NOT "%%H"=="" (
		SET "tabstg=%%G%repl%%%H"
		GOTO LOOP
	)
)
echo String = %tabstg%
endlocal
And some output execution.

Code: Select all

C:\Users\Squashman\Desktop>so.bat
Enter Replacement String or {enter} for none:_
String = Some_String_with_tabs

C:\Users\Squashman\Desktop>so.bat
Enter Replacement String or {enter} for none:
String = SomeStringwithtabs

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#6 Post by sst » 22 Nov 2020 11:12

Squashman wrote:
22 Nov 2020 10:23
sst wrote:
22 Nov 2020 03:31
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.
Well maybe I don't understand the English Language but you certainly inferred that with your REM comments.

Code: Select all

:: Remove all Spaces
:: ...
set "testTAB=#!String: =!#"

Code: Select all

REM The default delimiters are <SPACE> and <TAB>
REM Since the string doesnt not contain any <SPACE> chars,
REM A none empty second token proves the presence of the TAB char
Removing the spaces before using the string in FOR /F is essential for TAB detection.
The TAB detection is performed on the modified version of the string and then the substitution is performed on the original untouched string.
It doesn't mean that the code only works if there are NO spaces in the string.
Squashman wrote:
22 Nov 2020 10:23
You can then loop back before the `FOR /F` again and check for another tab and keep replacing or removing using that same logic.
Yes that's possible too. But with unknown number of backward GOTOs.
The code maybe more simpler this way, but I don't know how is that more optimal than avoiding the slow loop altogether and perform the substitution only once.

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#7 Post by sst » 22 Nov 2020 11:16

Squashman wrote:
22 Nov 2020 10:43
Based on what I understand this may be an easier solution for you.

Code: Select all

@echo off
setlocal

SET "tabstg=Some	String	with	tabs"
SET /P "repl=Enter Replacement String or {enter} for none:"
:LOOP
FOR /F "tokens=1*" %%G IN ("%tabstg%") DO (
	IF NOT "%%H"=="" (
		SET "tabstg=%%G%repl%%%H"
		GOTO LOOP
	)
)
echo String = %tabstg%
endlocal
This only works properly if there are no spaces in the string.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Detecting and removing TABs from a string without prior access to the TAB character

#8 Post by Squashman » 22 Nov 2020 11:37

sst wrote:
22 Nov 2020 11:16
This only works properly if there are no spaces in the string.
Yes. We both have already established that fact. You knew that in your original code as you were removing the spaces before the FOR /F and you made a comment about it inside the FOR /F.

Regardless, none of that was the point of my argument. My point was that you could simplify your code by doing the replacement or removal within the FOR /F command if you know there are not going to be any spaces. Your code was removing the spaces from the string so you could use the FOR /F logic to simplify your code.

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#9 Post by sst » 22 Nov 2020 12:08

Nope. I removed spaces as a temporary means of performing the TAB detection and obtaining the TAB character.
My code preserves the spaces while removing/replacing the TABs in the original string. Your's does not.
So it is not only a matter of simplification.

Eureka!
Posts: 137
Joined: 25 Jul 2019 18:25

Re: Detecting and removing TABs from a string without prior access to the TAB character

#10 Post by Eureka! » 22 Nov 2020 12:50

Basic concept (with comments for clarity)

Code: Select all

@echo off
setlocal

::  abc[space][space]def[tab]ghi.txt
    set STRING=abc  def	ghi.txt

::  Strip spaces
    set TEMPSTRING=%STRING: =%

::  Get first "word" before TAB
    for /f %%x in ("%TEMPSTRING%")  do set NOT=%%~x

::  Use all characters of first word as delimiter 
::  so you get the remainder (starting with a TAB)
    for /f "tokens=1 delims=%NOT%" %%x in ("%TEMPSTRING%")  do set REST=%%~x

::  TAB is the first chracter of the remainder
    set TAB=%REST:~0,1%
    echo TAB = __%TAB%___


::  Replace [TAB] with _
call echo NEWSTRING=%%STRING:%TAB%=_%%


sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#11 Post by sst » 22 Nov 2020 14:15

Thanks Eureka!. It is much better now.
Getting rid of :strlen was my primary concern. Now I'm relieved :D

Eureka!
Posts: 137
Joined: 25 Jul 2019 18:25

Re: Detecting and removing TABs from a string without prior access to the TAB character

#12 Post by Eureka! » 22 Nov 2020 15:03

You're welcome!

(FWIW: another alternative if you have delayedexpansion already enabled)

Code: Select all

@echo off
setlocal enabledelayedexpansion

    set STRING=abc  def	ghi.txt
    set TEMPSTRING=%STRING: =%
    for /f %%x in ("%TEMPSTRING%")  do set NOT=%%~x
    set REST=!TEMPSTRING:%NOT%=!
    set TAB=%REST:~0,1%
    echo NEWSTRING=!STRING:%TAB%=_!


sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#13 Post by sst » 22 Nov 2020 16:09

Unfortunately this one doesn't work as it is, if the first part of the string contains = or !
The case for ! can be resolved easily, but = is tough.

As a side note you should use something like "#%TEMPSTRING%#" in FOR /F to protect the leading and trailing TABs.
The reason for protecting the leading TAB is obvious but protecting the trailing TAB is also necessary for the TAB detection phase which I know you've skipped for the sake of simplicity, as I'm sure you don't want to perform the operation on a string which doesn't contain TABs.

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Detecting and removing TABs from a string without prior access to the TAB character

#14 Post by sst » 22 Nov 2020 16:42

Here is an interactive version of the script with the applied suggestion by Eureka!
Please note that SET /P strips the trailing TABs from user input. So the case for trailing TABs can not be tested with this interactive version. (At least that's the case on my Win7 x64)

EDIT:
Trailing TABs can be tested by entering Ctrl-@ (will be displayed as ^@) and one more ordinary character after that at the end of the input string. ex: Hello<TAB>^@z
Ctrl-@(^@) is the null character so every thing after that will be lost but it prevents the removal of the trailing TABs by SET /P

Code: Select all

@echo off
setlocal EnableExtensions DisableDelayedExpansion
for /F "tokens=3 delims=:" %%L in ("%~0") do endlocal & goto %%L
"%ComSpec%" /d /f:off /c "%~d0\:main:\..%~pnx0"
exit /b

:main
setlocal EnableExtensions EnableDelayedExpansion
:: Define LineFeed
set ^"LF=^%==%
%==%
%==%"
:: And the escaped one
set "eLF=^^!LF!!LF!"

:main.loop
set "charTAB="
set "String="
set /p "String=Enter a string with one or more TAB characters: "
if not defined String pause & exit /b

:: Remove all Spaces
:: And surround the string between two ordinary chars
:: to ensure that there will be no leading or trailing TABs
set "testTAB=#!String: =!#"
:: Remove all double quotation marks
set "testTAB=!testTAB:"=!"
:: Remove all LineFeeds
set ^"testTAB=!testTAB:%eLF%=!"
:: Remove all Bangs !
set "hasBang="
if not "!testTAB!"=="!testTAB:*!=!" (
    setlocal DisableDelayedExpansion
    set "hasBang=1"
    set "testTAB=%testTAB:!=%"
)
if defined hasBang (
    endlocal
    set "testTAB=%testTAB%"
)
:: Detecting the presence of TAB char in the string
set "hasTAB="
for /F "tokens=1,2" %%A in ("!testTAB!") do (
    if not "%%B"=="" (
        REM The default delimiters are <SPACE> and <TAB>
        REM Since the string doesn not contain any <SPACE> chars,
        REM A none empty second token proves the presence of the TAB char
        set "hasTAB=1"
        set "delims=%%A"
    )
)
if not defined hasTAB (
    echo The input string does not contain the TAB character.
    goto :main.loop
)
:: Grab the TAB character from the String itself
for /F "delims=%delims%" %%A in ("!testTAB!") do set "charTAB=%%A"
set "charTAB=!charTAB:~0,1!"
:: Now remove the TABs from the original string
set "noTabString=!String:%charTAB%=!"
echo Original String:     [!String!]
echo After removing TABs: [!noTabString!]

goto :main.loop

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Detecting and removing TABs from a string without prior access to the TAB character

#15 Post by penpen » 23 Nov 2020 09:39

I somehow missed this topic (sorry for the late reply). You could use the UTF-7 base 64 replacement encoding to store a tab in the source file, without the need of a literal tab (untested, especially not sure if i am correct with the value of the line-variable):

Code: Select all

@echo off
setlocal enableExtensions disableDelayedExpansion
call :getTab tab
echo(#%tab%#
goto :eof

:: If a tab-character is the first or last character in an input, it can't be read using set/p-command.
:: "<tab>" = 0x 00 22 00 09 00 22
::         = 0000 0000 0010 0010 0000 0000 1001 0000 0000 0010 0010 b
::         = 000000 000010 001000 000000 100100 000000 001000 100000 b64
::         = ACIACQAi base64
:: tab data base64
+ACIACQAi-
:getTab
setlocal enableExtensions disableDelayedExpansion
set "cp=850"
for /f "tokens=*" %%a in ('chcp') do for %%b in (%%a) do set "cp=%%~nb"
set "line="
for /f "delims=:" %%a in ('findstr /n /r /c:"^:: tab data base64$" "%~f0"') do if not defined line set /a "line=%%~a+1"

>nul chcp 65000
<"%~f0" (
	for /l %%a in (0, 1, %line%) do (
		set "data="
		set /p "data="
	)
)
>nul chcp %cp%
endlocal & set "%~1=%data:~1,1%"
goto :eof
penpen

Post Reply