Page 1 of 1

Detecting and removing TABs from a string without prior access to the TAB character

Posted: 21 Nov 2020 13:49
by sst
Detecting and/or removing the TAB characters from a string is a fairly simple task, define the TAB character using one of many available methods and do a substitution !string:%TAB%=!
But I was looking a way to detect and possibly remove or substitute the TAB characters without resorting to any external command or using literal TAB in the editor.
I came up with a solution, and It was simpler than I could imagine. Actually the detection does not require access to TAB character. For removal or substitution, the TAB character can be extracted the from the subject string itself so it is a fully self contained solution.

Here is the sample script to demonstrate the method. It is fully commented and is fairly simple to understand.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "String=!%~1!"
if not defined String (
    echo Pass the name of the environment variable which contains one or more TAB characters
    exit /b
)
:: Define LineFeed
set ^"LF=^%==%
%==%
%==%"
:: And the escaped one
set "eLF=^^!LF!!LF!"

:: Remove all Spaces
:: And surround the string between two ordinary chars
:: to ensure that there will be no leading or trailing TABs
set "testTAB=#!String: =!#"
:: Remove all double quotation marks
set "testTAB=!testTAB:"=!"
:: Remove all LineFeeds
set ^"testTAB=!testTAB:%eLF%=!"
:: Remove all Bangs !
set "hasBang="
if not "!testTAB!"=="!testTAB:*!=!" (
    setlocal DisableDelayedExpansion
    set "hasBang=1"
    set "testTAB=%testTAB:!=%"
)
if defined hasBang (
    endlocal
    set "testTAB=%testTAB%"
)
:: Detecting the presence of TAB char in the string
set "hasTAB="
for /F "tokens=1,2" %%A in ("!testTAB!") do (
    if not "%%B"=="" (
        REM The default delimiters are <SPACE> and <TAB>
        REM Since the string doesn not contain any <SPACE> chars,
        REM A none empty second token proves the presence of the TAB char
        set "hasTAB=1"
        set "lead=%%A"
    )
)
if defined hasTAB (
    echo The String contains the TAB character
    call :strlen len lead
    REM Extract the TAB character from the String itself
    for %%I in (!len!) do set "charTAB=!testTAB:~%%I,1!"
)
if defined hasTAB (
    REM Now remove the TABs from the original string
    set "noTabString=!String:%charTAB%=!"
    echo Original String:     [!String!]
    echo After removing TABs: [!noTabString!]
)
exit /b

:strlen <resultVar> <stringVar>
(
    setlocal EnableDelayedExpansion
    set "s=!%~2!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!s:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "s=!s:~%%P!"
        )
    )
    for %%L in (!len!) do (
        endlocal
        set "%~1=%%L"
    )
    exit /b
)

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 21 Nov 2020 18:19
by Squashman
As you said, this code only works if there are NO spaces in your string. But why go through all that extra code to remove the tab by getting the string length. You already have the TAB removed with the FOR /F command. And what if your string has multiple tabs???? Again the FOR /F would be more beneficial as you could just keep checking if the second token was not blank and use a GOTO to a label before the FOR /F command.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 03:31
by sst
Squashman wrote:
21 Nov 2020 18:19
But why go through all that extra code to remove the tab by getting the string length. You already have the TAB removed with the FOR /F command. And what if your string has multiple tabs????
FOR /F can not be used to remove the TABs from the string because as you said the string can contain multiple none consecutive TABs. It can only be used to detect if the string contains any TABs or not.
Moreover one may not wish to remove the TABs, but substitute them with something else, say SPACES for example.
So for removal or substitution one needs to have access the TAB character. That is why I used strlen to determine the offset to first occurrence of the TAB character in the modified string, so it can be grabbed and further be used to apply the substitution to original string: !string:%TAB%=SomeThingElse!

So if one only needs to know if the string contains any TABs or not, and does not need any substitution, Then all they have to do is this:

Code: Select all

:: assuming the string does not contain any line feeds
for /F "tokens=2" %%A in ("#!string: =!#") do (
   REM The body of the loop executes only if there is at least one TAB in the string.
)
Squashman wrote:
21 Nov 2020 18:19
As you said, this code only works if there are NO spaces in your string.
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 10:23
by Squashman
sst wrote:
22 Nov 2020 03:31
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.
Well maybe I don't understand the English Language but you certainly inferred that with your REM comments.

And again you could use a FOR /F to replace the tabs by concatenating the first and second tokens together. And if you wanted to replace that tab with something in that same FOR /F command you could easily set a variable for that as well. If the variable is undefined then it will just concatenate the two tokens together. If it is defined the concatenation will replace the TAB. You can then loop back before the `FOR /F` again and check for another tab and keep replacing or removing using that same logic.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 10:43
by Squashman
Based on what I understand this may be an easier solution for you.

Code: Select all

@echo off
setlocal

SET "tabstg=Some	String	with	tabs"
SET /P "repl=Enter Replacement String or {enter} for none:"
:LOOP
FOR /F "tokens=1*" %%G IN ("%tabstg%") DO (
	IF NOT "%%H"=="" (
		SET "tabstg=%%G%repl%%%H"
		GOTO LOOP
	)
)
echo String = %tabstg%
endlocal
And some output execution.

Code: Select all

C:\Users\Squashman\Desktop>so.bat
Enter Replacement String or {enter} for none:_
String = Some_String_with_tabs

C:\Users\Squashman\Desktop>so.bat
Enter Replacement String or {enter} for none:
String = SomeStringwithtabs

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 11:12
by sst
Squashman wrote:
22 Nov 2020 10:23
sst wrote:
22 Nov 2020 03:31
I didn't say that. And the spaces are not a problem. But I think It should be obvious by know.
Well maybe I don't understand the English Language but you certainly inferred that with your REM comments.

Code: Select all

:: Remove all Spaces
:: ...
set "testTAB=#!String: =!#"

Code: Select all

REM The default delimiters are <SPACE> and <TAB>
REM Since the string doesnt not contain any <SPACE> chars,
REM A none empty second token proves the presence of the TAB char
Removing the spaces before using the string in FOR /F is essential for TAB detection.
The TAB detection is performed on the modified version of the string and then the substitution is performed on the original untouched string.
It doesn't mean that the code only works if there are NO spaces in the string.
Squashman wrote:
22 Nov 2020 10:23
You can then loop back before the `FOR /F` again and check for another tab and keep replacing or removing using that same logic.
Yes that's possible too. But with unknown number of backward GOTOs.
The code maybe more simpler this way, but I don't know how is that more optimal than avoiding the slow loop altogether and perform the substitution only once.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 11:16
by sst
Squashman wrote:
22 Nov 2020 10:43
Based on what I understand this may be an easier solution for you.

Code: Select all

@echo off
setlocal

SET "tabstg=Some	String	with	tabs"
SET /P "repl=Enter Replacement String or {enter} for none:"
:LOOP
FOR /F "tokens=1*" %%G IN ("%tabstg%") DO (
	IF NOT "%%H"=="" (
		SET "tabstg=%%G%repl%%%H"
		GOTO LOOP
	)
)
echo String = %tabstg%
endlocal
This only works properly if there are no spaces in the string.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 11:37
by Squashman
sst wrote:
22 Nov 2020 11:16
This only works properly if there are no spaces in the string.
Yes. We both have already established that fact. You knew that in your original code as you were removing the spaces before the FOR /F and you made a comment about it inside the FOR /F.

Regardless, none of that was the point of my argument. My point was that you could simplify your code by doing the replacement or removal within the FOR /F command if you know there are not going to be any spaces. Your code was removing the spaces from the string so you could use the FOR /F logic to simplify your code.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 12:08
by sst
Nope. I removed spaces as a temporary means of performing the TAB detection and obtaining the TAB character.
My code preserves the spaces while removing/replacing the TABs in the original string. Your's does not.
So it is not only a matter of simplification.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 12:50
by Eureka!
Basic concept (with comments for clarity)

Code: Select all

@echo off
setlocal

::  abc[space][space]def[tab]ghi.txt
    set STRING=abc  def	ghi.txt

::  Strip spaces
    set TEMPSTRING=%STRING: =%

::  Get first "word" before TAB
    for /f %%x in ("%TEMPSTRING%")  do set NOT=%%~x

::  Use all characters of first word as delimiter 
::  so you get the remainder (starting with a TAB)
    for /f "tokens=1 delims=%NOT%" %%x in ("%TEMPSTRING%")  do set REST=%%~x

::  TAB is the first chracter of the remainder
    set TAB=%REST:~0,1%
    echo TAB = __%TAB%___


::  Replace [TAB] with _
call echo NEWSTRING=%%STRING:%TAB%=_%%


Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 14:15
by sst
Thanks Eureka!. It is much better now.
Getting rid of :strlen was my primary concern. Now I'm relieved :D

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 15:03
by Eureka!
You're welcome!

(FWIW: another alternative if you have delayedexpansion already enabled)

Code: Select all

@echo off
setlocal enabledelayedexpansion

    set STRING=abc  def	ghi.txt
    set TEMPSTRING=%STRING: =%
    for /f %%x in ("%TEMPSTRING%")  do set NOT=%%~x
    set REST=!TEMPSTRING:%NOT%=!
    set TAB=%REST:~0,1%
    echo NEWSTRING=!STRING:%TAB%=_!


Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 16:09
by sst
Unfortunately this one doesn't work as it is, if the first part of the string contains = or !
The case for ! can be resolved easily, but = is tough.

As a side note you should use something like "#%TEMPSTRING%#" in FOR /F to protect the leading and trailing TABs.
The reason for protecting the leading TAB is obvious but protecting the trailing TAB is also necessary for the TAB detection phase which I know you've skipped for the sake of simplicity, as I'm sure you don't want to perform the operation on a string which doesn't contain TABs.

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 22 Nov 2020 16:42
by sst
Here is an interactive version of the script with the applied suggestion by Eureka!
Please note that SET /P strips the trailing TABs from user input. So the case for trailing TABs can not be tested with this interactive version. (At least that's the case on my Win7 x64)

EDIT:
Trailing TABs can be tested by entering Ctrl-@ (will be displayed as ^@) and one more ordinary character after that at the end of the input string. ex: Hello<TAB>^@z
Ctrl-@(^@) is the null character so every thing after that will be lost but it prevents the removal of the trailing TABs by SET /P

Code: Select all

@echo off
setlocal EnableExtensions DisableDelayedExpansion
for /F "tokens=3 delims=:" %%L in ("%~0") do endlocal & goto %%L
"%ComSpec%" /d /f:off /c "%~d0\:main:\..%~pnx0"
exit /b

:main
setlocal EnableExtensions EnableDelayedExpansion
:: Define LineFeed
set ^"LF=^%==%
%==%
%==%"
:: And the escaped one
set "eLF=^^!LF!!LF!"

:main.loop
set "charTAB="
set "String="
set /p "String=Enter a string with one or more TAB characters: "
if not defined String pause & exit /b

:: Remove all Spaces
:: And surround the string between two ordinary chars
:: to ensure that there will be no leading or trailing TABs
set "testTAB=#!String: =!#"
:: Remove all double quotation marks
set "testTAB=!testTAB:"=!"
:: Remove all LineFeeds
set ^"testTAB=!testTAB:%eLF%=!"
:: Remove all Bangs !
set "hasBang="
if not "!testTAB!"=="!testTAB:*!=!" (
    setlocal DisableDelayedExpansion
    set "hasBang=1"
    set "testTAB=%testTAB:!=%"
)
if defined hasBang (
    endlocal
    set "testTAB=%testTAB%"
)
:: Detecting the presence of TAB char in the string
set "hasTAB="
for /F "tokens=1,2" %%A in ("!testTAB!") do (
    if not "%%B"=="" (
        REM The default delimiters are <SPACE> and <TAB>
        REM Since the string doesn not contain any <SPACE> chars,
        REM A none empty second token proves the presence of the TAB char
        set "hasTAB=1"
        set "delims=%%A"
    )
)
if not defined hasTAB (
    echo The input string does not contain the TAB character.
    goto :main.loop
)
:: Grab the TAB character from the String itself
for /F "delims=%delims%" %%A in ("!testTAB!") do set "charTAB=%%A"
set "charTAB=!charTAB:~0,1!"
:: Now remove the TABs from the original string
set "noTabString=!String:%charTAB%=!"
echo Original String:     [!String!]
echo After removing TABs: [!noTabString!]

goto :main.loop

Re: Detecting and removing TABs from a string without prior access to the TAB character

Posted: 23 Nov 2020 09:39
by penpen
I somehow missed this topic (sorry for the late reply). You could use the UTF-7 base 64 replacement encoding to store a tab in the source file, without the need of a literal tab (untested, especially not sure if i am correct with the value of the line-variable):

Code: Select all

@echo off
setlocal enableExtensions disableDelayedExpansion
call :getTab tab
echo(#%tab%#
goto :eof

:: If a tab-character is the first or last character in an input, it can't be read using set/p-command.
:: "<tab>" = 0x 00 22 00 09 00 22
::         = 0000 0000 0010 0010 0000 0000 1001 0000 0000 0010 0010 b
::         = 000000 000010 001000 000000 100100 000000 001000 100000 b64
::         = ACIACQAi base64
:: tab data base64
+ACIACQAi-
:getTab
setlocal enableExtensions disableDelayedExpansion
set "cp=850"
for /f "tokens=*" %%a in ('chcp') do for %%b in (%%a) do set "cp=%%~nb"
set "line="
for /f "delims=:" %%a in ('findstr /n /r /c:"^:: tab data base64$" "%~f0"') do if not defined line set /a "line=%%~a+1"

>nul chcp 65000
<"%~f0" (
	for /l %%a in (0, 1, %line%) do (
		set "data="
		set /p "data="
	)
)
>nul chcp %cp%
endlocal & set "%~1=%data:~1,1%"
goto :eof
penpen