Rules for how CMD.EXE parses numbers
Posted: 07 Sep 2012 20:37
EDIT - After this initial post, many additional discoveries were made and documented by various people in later posts of this thread. This initial post has been edited to account for some, but not all, of those additional discoveries. Please read the entire thread for additional edge cases and differences between Windows versions.
There are multiple contexts where CMD.EXE parses a string into a 4 byte signed integer value ranging from -2147483648 to 2147483647:
- SET /A
- IF
- %var:~n,m% (variable substring expansion)
- FOR /F "TOKENS=n"
- FOR /F "SKIP=n"
- FOR /L %%A in (n1 n2 n3)
In all the above contexts, CMD can parse numbers expressed as decimal, hexadecimal, or octal notation:
But there are subtle differences depending on the context. The differences are in how negative numbers are parsed, and also how overflow and invalid number errors are handled. It appears there is one set of rules for SET /A, and another set of rules used by all other contexts. To make matters worse, SET /A behavior on XP is different than the more modern Windows versions (Vista onward).
One additional command accepts only decimal numbers:
- EXIT [/B] n
SET /A
(Vista, Windows 7, [Windows 8?])
Literals
decimal - The sign is initially ignored and the string of decimal digits is first converted into the unsigned binary numeric representation. Afterward, if the number was preceded by a negative sign, then the negative value is computed by taking the 2's compliment of the binary value. (invert digits and add 1)
-1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111
Everything works great except the negative limit of a signed 4 byte integer cannot be expressed! The problem is the parser limits itself to 31 bits in the 1st step. If the 32nd bit (the sign bit) is set, then the parser detects an overflow error.
-2147483648 -> 1000 0000 0000 0000 0000 0000 0000 0000 : ERROR - overflow detected
The actual error message is "Invalid number. Numbers are limited to 32-bits of precision.", with ERRORLEVEL=1073750992. Very misleading and unfortunate if you ask me.
If an invalid digit is used, then a different error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
hexadecimal - The parser initially ignores any sign and the string of hexadecimal digits is converted into the unsigned binary numeric representation. If the number was preceded by a negative sign then the negative value is computed by taking the 2's compliment of the binary value.
The difference is that the SET /A hexadecimal parser allows the 32nd bit to be set during the initial parsing. After the initial parsing is complete, the 32nd bit is treated as the sign bit. So there are 2 representations for every number!
0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
-0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
-0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
The oddball is -2147483648 because the 2's compliment of that number is itself!
0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000
-0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000 -> 1000 0000 0000 0000 0000 0000 0000 0000
Actually there are many more representations for each number because additional leading 0s can be added. There is no limit other than the 8191 limit to a command line.
0x1, 0x01, 0x00000000000000000000000000000000001 are all equivalent representations of 1.
Another odd SET /A behavior is that overflow conditions are ignored when parsing hexadecimal notation. Any hex notation that would require 33 or more bits will result in either 1 or -1.
The following all result in -1:
The following all result in 1:
If an invalid hex digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
octal - The number is parsed similarly to decimal. The sign is initailly ignored and the octal digits are converted into a 31 bit unsigned integer. If the 32nd bit is set then an overflow error is detected. Any negative sign is applied afterward by taking the 2's compliment, but only if no error was detected.
So -2147483648 cannot be represented with octal notation, just as it cannot be represented with decimal notation.
Just like with hexadecimal, any number of leading zeros may be prefixed to a valid octal number.
00000000000000000000000000000000000000000000000001 --> 1
-00000000000000000000000000000000000000000000000001 --> -1
If an invalid octal digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991. This error is a common occurrence when decimal 8 or 9 is zero prefixed, as can occur when parsing date and time information.
(XP)
Hexadecimal, decimal, and octal on XP all follow rules similar to hexadecimal on Vista and beyond.
First, any leading minus sign is ignored and the value is parsed as a 32 bit unsigned integer. Afterward, the 32nd bit is treated as a sign bit, and then any minus sign is applied by taking the two's complement. So every value has at least two representations with each base:
SET /A 2147483650 = SET /A -2147483646 = -2147483646
SET /A 0x80000002 = SET /A -0x7FFFFFFE = -2147483646
SET /A 020000000002 = SET /A -017777777776 = -2147483646
Any value that exceeds 32 bits during the initial unsigned parsing results in an overflow error.
There is one complication when all bits are set. When CMD.EXE is first launched, then the following all give the expected value of -1
SET /A 4294967295 = -1
SET /A 0XFFFFFFFF = -1
SET /A 037777777777 = -1
But if a math overflow is ever detected by CMD.EXE, then the above three will raise an overflow error instead for the remainder of the CMD.EXE session. The triggering overflow can occur in SET /A as described above. It can also occur with IF, FOR "TOKENS=n", FOR "SKIP=n", FOR /L, and variable expansion with substring operations as described later on.
Variables (all versions)
The rules for parsing un-expanded numeric variables are different. All three numeric notations employ a similar strategy: First ignore any leading negative sign and convert the number into an unsigned binary representation, stopping as soon as an invalid character is reached. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
Undefined variables are treated as zero, and variables that do not contain a valid numeric format are treated as zero.
A defined variable that does not start out as a valid number is treated the same as an undefined variable - value equals zero. But something like -123JUNK is assigned value -123.
IF
IF only parses numbers when one of (EQU, NEQ, LSS, LEQ, GTR, GEQ) is used. The == comparison operator always results in a string comparison.
All three numeric notations employ a similar strategy: First ignore any leading negative sign and parse the number into an unsigned binary 31 bit integer. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the appropriately signed maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
This is a radical departure for hex notation. With SET /A, 0xFFFFFFFF sets the sign bit and the value is -1. With IF, 0xFFFFFFFF is treated as a positive number with an overflow condition, so it becomes 2147483647.
One other major difference - Numeric parsing is abandoned when an invalid digit is detected and IF uses a string comparison. Numeric parsing is also abandoned if the number starts with two or more minus signs.
%var:~n,m% (variable substring expansion)
I believe substring numeric parsing is the same as for IF, but it is difficult to prove because variables are limited to length 8191. The only thing I can prove is that overflow conditions give the same result as a non-overflow number that exceeds the length of the string.
If an invalid digit or more than one minus sign is detected, then variable expansion is aborted and the result is the code (minus the percents) instead of a substring of the value.
echo %var:~09% --> var:~09
FOR /F "TOKENS=n"
Again I believe the numeric parsing rules are the same as for IF, but it is even more difficult to prove.
Any value < 1 results in a syntax error. This includes negative values with an overflow condition.
Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.
Any value > 31 results in a FOR /F parsing no-op (that request for a token is ignored) because FOR /F is limited to parsing a maximum of 31 tokens. This includes positive numbers with an overflow condition.
results in A=1, B=31, C=%C, D=%D
This has nothing to do with number parsing, but note how the token numbers are sorted prior to assigning the letters.
FOR /F "SKIP=n"
Exactly the same as "TOKENS" except I believe the max SKIP value is 2147483647. I did some testing, but it is a pain, and I'm not sure I tested properly.
Any SKIP value < 1 results in a syntax error.
Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.
I believe any SKIP value > 2147483647 results in immediate termination of the parsing of the input. But the command is still executed even if the positive overflow occurs.
The above generates the "File Not Found" error.
From what I remember of my testing, "SKIP=0x7FFFFFFF" properly skipped the proper number of lines in a huge file, and "SKIP=0x80000000" immediately returned without error and without taking the time to scan the huge file.
FOR /L %%A in (n1 n2 n3)
Again, all three numbers are parsed using basically the same rules as used by IF. Overflow values are converted into the appropriately signed maximum magnitude value.
Here is a demonstration of all three number bases:
Here is a demonstration of both positive and negative overflow:
FOR /L will parse each numeric token up until it finds an invalid character (invalid digit, or multiple minus signs). If the token starts off with an invalid character, then it is treated as zero. Not shown, but missing values are also treated as zero.
This has nothing to do with parsing, but setting the end value to the max (or min) value will result in an endless loop if the increment matches the sign because the incremented value results in an overflow which results in the opposite sign. The examples below use EXIT to break out of the endless loop.
EXIT n
EXIT /B n
The EXIT command is radically different than all other contexts in that it only accepts decimal values. The return code can be any value represented by a 32 bit signed integer.
Any decimal number of any magnitude will be accepted. There is a perpetual overflow rollover into the opposite sign. I believe any leading minus sign is initially ignored, and the value is parsed into a 32 bit unsigned integer, modulo 4294967296. If there was a leading minus sign, then the two's complement is taken, and finally the 32nd bit is then treated as a sign bit.
Leading zeros are ignored.
The token is parsed as a decimal number up until the first invalid character. All remaining characters are then ignored.
A token that starts of with an invalid character is treated as no value (the result is the same as calling EXIT without any argument).
--OUTPUT--
Dave Benham
There are multiple contexts where CMD.EXE parses a string into a 4 byte signed integer value ranging from -2147483648 to 2147483647:
- SET /A
- IF
- %var:~n,m% (variable substring expansion)
- FOR /F "TOKENS=n"
- FOR /F "SKIP=n"
- FOR /L %%A in (n1 n2 n3)
In all the above contexts, CMD can parse numbers expressed as decimal, hexadecimal, or octal notation:
Code: Select all
decimal: [-]{non-zero decimal digit}[{decimal digit}...]
hexadecimal: [-]0{x|X}{hexadecimal digit}[{hexadecimal digit}...]
octal: [-]0{octal digit}[{octal digit}...]
{decimal digit} = any of {0|1|2|3|4|5|6|7|8|9}
{hexadecimal digit} = any of {0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|a|b|c|d|e|f}
{octal digit} = any of {0|1|2|3|4|5|6|7}
But there are subtle differences depending on the context. The differences are in how negative numbers are parsed, and also how overflow and invalid number errors are handled. It appears there is one set of rules for SET /A, and another set of rules used by all other contexts. To make matters worse, SET /A behavior on XP is different than the more modern Windows versions (Vista onward).
One additional command accepts only decimal numbers:
- EXIT [/B] n
SET /A
(Vista, Windows 7, [Windows 8?])
Literals
decimal - The sign is initially ignored and the string of decimal digits is first converted into the unsigned binary numeric representation. Afterward, if the number was preceded by a negative sign, then the negative value is computed by taking the 2's compliment of the binary value. (invert digits and add 1)
-1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111
Everything works great except the negative limit of a signed 4 byte integer cannot be expressed! The problem is the parser limits itself to 31 bits in the 1st step. If the 32nd bit (the sign bit) is set, then the parser detects an overflow error.
-2147483648 -> 1000 0000 0000 0000 0000 0000 0000 0000 : ERROR - overflow detected
The actual error message is "Invalid number. Numbers are limited to 32-bits of precision.", with ERRORLEVEL=1073750992. Very misleading and unfortunate if you ask me.
If an invalid digit is used, then a different error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
hexadecimal - The parser initially ignores any sign and the string of hexadecimal digits is converted into the unsigned binary numeric representation. If the number was preceded by a negative sign then the negative value is computed by taking the 2's compliment of the binary value.
The difference is that the SET /A hexadecimal parser allows the 32nd bit to be set during the initial parsing. After the initial parsing is complete, the 32nd bit is treated as the sign bit. So there are 2 representations for every number!
0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
-0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
-0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
The oddball is -2147483648 because the 2's compliment of that number is itself!
0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000
-0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000 -> 1000 0000 0000 0000 0000 0000 0000 0000
Actually there are many more representations for each number because additional leading 0s can be added. There is no limit other than the 8191 limit to a command line.
0x1, 0x01, 0x00000000000000000000000000000000001 are all equivalent representations of 1.
Another odd SET /A behavior is that overflow conditions are ignored when parsing hexadecimal notation. Any hex notation that would require 33 or more bits will result in either 1 or -1.
The following all result in -1:
Code: Select all
set /a 0x1000000000
set /a 0xFFFFFFFFFF
set /a 0x888888888888888888888
The following all result in 1:
Code: Select all
set /a -0x1000000000
set /a -0xFFFFFFFFFF
set /a -0x888888888888888888888
If an invalid hex digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
octal - The number is parsed similarly to decimal. The sign is initailly ignored and the octal digits are converted into a 31 bit unsigned integer. If the 32nd bit is set then an overflow error is detected. Any negative sign is applied afterward by taking the 2's compliment, but only if no error was detected.
So -2147483648 cannot be represented with octal notation, just as it cannot be represented with decimal notation.
Just like with hexadecimal, any number of leading zeros may be prefixed to a valid octal number.
00000000000000000000000000000000000000000000000001 --> 1
-00000000000000000000000000000000000000000000000001 --> -1
If an invalid octal digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991. This error is a common occurrence when decimal 8 or 9 is zero prefixed, as can occur when parsing date and time information.
(XP)
Hexadecimal, decimal, and octal on XP all follow rules similar to hexadecimal on Vista and beyond.
First, any leading minus sign is ignored and the value is parsed as a 32 bit unsigned integer. Afterward, the 32nd bit is treated as a sign bit, and then any minus sign is applied by taking the two's complement. So every value has at least two representations with each base:
SET /A 2147483650 = SET /A -2147483646 = -2147483646
SET /A 0x80000002 = SET /A -0x7FFFFFFE = -2147483646
SET /A 020000000002 = SET /A -017777777776 = -2147483646
Any value that exceeds 32 bits during the initial unsigned parsing results in an overflow error.
There is one complication when all bits are set. When CMD.EXE is first launched, then the following all give the expected value of -1
SET /A 4294967295 = -1
SET /A 0XFFFFFFFF = -1
SET /A 037777777777 = -1
But if a math overflow is ever detected by CMD.EXE, then the above three will raise an overflow error instead for the remainder of the CMD.EXE session. The triggering overflow can occur in SET /A as described above. It can also occur with IF, FOR "TOKENS=n", FOR "SKIP=n", FOR /L, and variable expansion with substring operations as described later on.
Variables (all versions)
The rules for parsing un-expanded numeric variables are different. All three numeric notations employ a similar strategy: First ignore any leading negative sign and convert the number into an unsigned binary representation, stopping as soon as an invalid character is reached. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
Undefined variables are treated as zero, and variables that do not contain a valid numeric format are treated as zero.
A defined variable that does not start out as a valid number is treated the same as an undefined variable - value equals zero. But something like -123JUNK is assigned value -123.
IF
IF only parses numbers when one of (EQU, NEQ, LSS, LEQ, GTR, GEQ) is used. The == comparison operator always results in a string comparison.
All three numeric notations employ a similar strategy: First ignore any leading negative sign and parse the number into an unsigned binary 31 bit integer. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the appropriately signed maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
Code: Select all
if 2147483647==999999999999999999999999 echo These numbers are equal
if -2147483648=-999999999999999999999999 echo These numbers are equal
if 0xFFFFFFFFFFFFF==2147483647 echo These numbers are equal
if -0xFFFFFFFFFFFF==-2147483648 echo These numbers are equal
if 077777777777777==2147483647 echo These numbers are equal
if -077777777777777==-2147483648 echo These numbers are equal
This is a radical departure for hex notation. With SET /A, 0xFFFFFFFF sets the sign bit and the value is -1. With IF, 0xFFFFFFFF is treated as a positive number with an overflow condition, so it becomes 2147483647.
One other major difference - Numeric parsing is abandoned when an invalid digit is detected and IF uses a string comparison. Numeric parsing is also abandoned if the number starts with two or more minus signs.
Code: Select all
if 09 lss 9 echo TRUE because 9 is an invalid octal digit so string comparison is used
if --1 gtr 1 echo TRUE because only one minus allowed for numbers
%var:~n,m% (variable substring expansion)
I believe substring numeric parsing is the same as for IF, but it is difficult to prove because variables are limited to length 8191. The only thing I can prove is that overflow conditions give the same result as a non-overflow number that exceeds the length of the string.
Code: Select all
set var=hello
::All of the following statements print out the entire string
echo %var:~0,5%
echo %var:~0,10%
echo %var:~0,9999999999999999999%
echo %var:~0,0xA%
echo %var:~0,0xFFFFFFFFFFFFFFFFF%
echo %var:~0,05%
echo %var:~0,0777777777777777777%
echo %var:~-5%
echo %var:~-10%
echo %var:~-9999999999999999999%
echo %var:~-0xA%
echo %var:~-0xFFFFFFFFFFFFFFFFF%
echo %var:~-05%
echo %var:~-0777777777777777777%
If an invalid digit or more than one minus sign is detected, then variable expansion is aborted and the result is the code (minus the percents) instead of a substring of the value.
echo %var:~09% --> var:~09
FOR /F "TOKENS=n"
Again I believe the numeric parsing rules are the same as for IF, but it is even more difficult to prove.
Any value < 1 results in a syntax error. This includes negative values with an overflow condition.
Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.
Any value > 31 results in a FOR /F parsing no-op (that request for a token is ignored) because FOR /F is limited to parsing a maximum of 31 tokens. This includes positive numbers with an overflow condition.
Code: Select all
for /f "tokens=31,32,0xFFFFFFFFFFFF,1" %A in (
"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33"
) do @echo A=%A, B=%B, C=%C, D=%D
This has nothing to do with number parsing, but note how the token numbers are sorted prior to assigning the letters.
FOR /F "SKIP=n"
Exactly the same as "TOKENS" except I believe the max SKIP value is 2147483647. I did some testing, but it is a pain, and I'm not sure I tested properly.
Any SKIP value < 1 results in a syntax error.
Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.
I believe any SKIP value > 2147483647 results in immediate termination of the parsing of the input. But the command is still executed even if the positive overflow occurs.
Code: Select all
for /f "skip=0x80000000" %A in ('dir "does not exist"') do @echo %A
From what I remember of my testing, "SKIP=0x7FFFFFFF" properly skipped the proper number of lines in a huge file, and "SKIP=0x80000000" immediately returned without error and without taking the time to scan the huge file.
FOR /L %%A in (n1 n2 n3)
Again, all three numbers are parsed using basically the same rules as used by IF. Overflow values are converted into the appropriately signed maximum magnitude value.
Here is a demonstration of all three number bases:
Code: Select all
C:\test>for /L %N in (0xF 012 35) do @echo %N
15
25
35
Here is a demonstration of both positive and negative overflow:
Code: Select all
C:\test>for /L %N in (999999999999999999999999 -1 0x7FFFFFFD) do @echo %N
2147483647
2147483646
2147483645
C:\test>for /L %N in (-0777777777777777777777777 1 -0x7FFFFFFD) do @echo %N
-2147483648
-2147483647
-2147483646
-2147483645
FOR /L will parse each numeric token up until it finds an invalid character (invalid digit, or multiple minus signs). If the token starts off with an invalid character, then it is treated as zero. Not shown, but missing values are also treated as zero.
Code: Select all
C:\test>for /L %N in (G45 1 038) do @echo %N
0
1
2
3
This has nothing to do with parsing, but setting the end value to the max (or min) value will result in an endless loop if the increment matches the sign because the incremented value results in an overflow which results in the opposite sign. The examples below use EXIT to break out of the endless loop.
Code: Select all
C:\test>cmd /c for /l %N in (0x7FFFFFFE 1 0x7FFFFFFF) do @(echo %N^&if %N geq -0x7FFFFFFE exit)
2147483646
2147483647
-2147483648
-2147483647
-2147483646
C:\test>cmd /c for /l %N in (-0x7FFFFFFE -1 -0x80000000) do @(echo %N^&if %N leq 0x7FFFFFFE exit)
-2147483646
-2147483647
-2147483648
2147483647
2147483646
EXIT n
EXIT /B n
The EXIT command is radically different than all other contexts in that it only accepts decimal values. The return code can be any value represented by a 32 bit signed integer.
Any decimal number of any magnitude will be accepted. There is a perpetual overflow rollover into the opposite sign. I believe any leading minus sign is initially ignored, and the value is parsed into a 32 bit unsigned integer, modulo 4294967296. If there was a leading minus sign, then the two's complement is taken, and finally the 32nd bit is then treated as a sign bit.
Leading zeros are ignored.
The token is parsed as a decimal number up until the first invalid character. All remaining characters are then ignored.
A token that starts of with an invalid character is treated as no value (the result is the same as calling EXIT without any argument).
Code: Select all
@echo off
setlocal enableDelayedExpansion
for %%N in (
2147483647
2147483648
4294967295
4294967296
4294967297
6442450943
6442450944
8589934591
8589934592
8589934593
00000045
51g
""
--1
hello
) do (
echo EXIT /B %%~N
call :test %%N
echo !errorlevel!
echo(
)
exit /b
:test
exit /b %1
--OUTPUT--
Code: Select all
EXIT /B 2147483647
2147483647
EXIT /B 2147483648
-2147483648
EXIT /B 4294967295
-1
EXIT /B 4294967296
0
EXIT /B 4294967297
1
EXIT /B 6442450943
2147483647
EXIT /B 6442450944
-2147483648
EXIT /B 8589934591
-1
EXIT /B 8589934592
0
EXIT /B 8589934593
1
EXIT /B 00000045
45
EXIT /B 51g
51
EXIT /B
51
EXIT /B --1
51
EXIT /B hello
51
Dave Benham