Page 1 of 1

Assembly language code "in-line" for Batch files!

Posted: 22 Feb 2015 01:22
by Aacini
After I read this penpen's post:

penpen wrote:I've once started to write an assembler using batch (actually i don't know if i will ever finish it)...

... I couldn't resist the temptation of write my own assembler in Batch! (I don't understand why this always happens to me :roll: ). However, in this case my objective was very specific: create a valid executable file in the simplest possible way from assembly source code placed inside the Batch file (a Batch-assembly hybrid!). In order to achieve this goal, I choose the simplest 16-bits instructions from the 80286 CPU and the straightforward MS-DOS .com file format, and set some limitations in the allowed assembly source code. In despite of these restrictions, the assembly code is standard and it may be assembled by any other assembler after all the required paraphernalia was added. The assembly code used in my Batch program is pretty simple. Here it is!

Code: Select all

@echo off

rem BatchAsm.bat: Limited version of a x86 16-bits "in-line" assembler written in Batch
rem Antonio Perez Ayala
rem 2015/02/21 - First version

rem Example: Create example.com executable file

rem The definition of the following variable activate the creation of listing .lst file
setlocal
set .list=1

rem The name of the .com file preceded by colon must appear after "goto" in
rem "call :asm" line as shown below; the assembly source code starts at next line
rem (TO DO: change this method by a macro with one parameter ;)

call :asm example.com  &  goto :example.com

        jmp     start                   ;jumps over data area

CR              EQU     13
LF              EQU     10
EXCLAM          EQU     33              ;Ascii code of "!"

text1           DB      "Hello $"
text2           DB      "World",EXCLAM,CR,LF
TEXT2_LEN       EQU     $-text2         ;length of previous string

PRINT_STRING            EQU     9       ;DOS function
VIDEO_OUTPUT            EQU     2       ;DOS function
TERMINATE_PROGRAM       EQU     0       ;DOS function

start:

        ;Display a string terminated in "$" using DOS function 9

        mov     dx, OFFSET text1        ;DX -> text1
        mov     ah, PRINT_STRING        ;AH = DOS function
        int     21H                     ;show the DX->"string$"

        ;Display a string given its length via DOS function 2 and a loop

        lea     bx, text2               ;BX -> text2 (using LEA instead of OFFSET)
        mov     cx, TEXT2_LEN           ;CX = number of chars
        ;
nextChar:
        mov     dl, [bx]                ;DL = this char
        inc     bx                      ;advance BX to next char
        mov     ah, VIDEO_OUTPUT        ;AH = DOS function
        int     21H                     ;show the char
        loop     nextChar               ;and repeat for CX chars

        ;Terminate program

        mov     al, 0                   ;AL = errorlevel
        mov     ah, TERMINATE_PROGRAM   ;AH = DOS function
        int     21H                     ;terminate program

:example.com

rem Previous ":filename.com" line mark the end of the assembly source code

if errorlevel 1 echo Error in assembly & goto :EOF

echo Run example.com program:
example
goto :EOF



+===================================================+
|       Assembler "in-line" (:asm subroutine)       |
+===================================================+

:asm filename
setlocal EnableDelayedExpansion

set "_ascii= ^!"#$%%^&'()*+,-./0123456789:;^<=^>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^
^^^_`abcdefghijklmnopqrstuvwxyz{^|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬^
­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"

rem Define error messages
set "i=1"
for %%a in (
             "_badAddr=Addressing mode not implemented or invalid: '%%_errVal%%'"
              "_noMore=Code label can not include any additional element"
             "_noLabel=Code label not found: '%%_errVal%%'"
            "_farLabel=Code label '%%_errVal%%' too far %%_errVal2%% by %%_errVal3%% bytes"
             "_badType=Data type in LABEL directive not implemented or invalid: '%%_errVal%%'"
          "_notBothMem=Destination and source operands can not be both variables"
            "_badSizes=Destination and source operands must have the same size"
           "_notImmedD=Destination operand can not be a constant: '%%_errVal%%'"
              "_noSize=Destination operand have no size: '%%_errVal%%'"
                "_regD=Destination operand must be a register: '%%_errVal%%'"
             "_notData=Destination label can not be a data variable: '%%_errVal%%'"
              "_notYet=Instruction/directive not implemented: '%%_errVal%%'/'%%_errVal2%%'"
             "_xchgOps=Instruction not implemented in this form; exchange the operands"
             "_notByte=Operand can not be Byte size: '%%_errVal%%'"
              "_notVar=Operand must be a data variable: '%%_errVal%%'"
           "_notImmedS=Source operand can not be a constant: '%%_errVal%%'"
             "_notRegS=Source operand can not be a register: '%%_errVal%%'"
              "_syntax=Syntax error"
              "_notDef=Undefined variable: '%%_errVal%%'"
           ) do (
   for /F "tokens=1,2 delims==" %%b in (%%a) do (
      set /A "i+=1, %%b=i"
      set "errorMssg[!i!]=%%c"
   )
)

rem Define op-codes for No operand (string) operations
for %%a in ( "cld=0xFC"  "movsB=0xA4"    "rep=0xF2"
             "std=0xFD"  "movsW=0xA5"  "repNE=0xF2"
                         "cmpsB=0xA6"  "repNZ=0xF2"
                         "cmpsW=0xA7"   "repE=0xF3"
                         "stosB=0xAA"   "repZ=0xF3"
                         "stosW=0xAB"
                         "lodsB=0xAC"
                         "lodsW=0xAD"
                         "scasB=0xAE"
                         "scasW=0xAF"   ) do (
   for /F "tokens=1,2 delims==" %%b in (%%a) do (
      set /A "NoOperCode[%%b]=%%c"
   )
)

rem Define op-codes for One operand operations: opCode+regPart
for %%a in ( "inc=0xFE+0"  "not=0xF6+2"   "pop=0x8F+0"
             "dec=0xFE+1"  "neg=0xF6+3"  "push=0xFF+6"
                           "mul=0xF6+4"
                          "imul=0xF6+5"
                           "div=0xF6+6"
                          "idiv=0xF6+7"  ) do (
   for /F "tokens=1-3 delims==+" %%b in (%%a) do (
      set /A "OneOperCode[%%b]=%%c, reg[%%b]=%%d"
   )
)

rem Define op-codes for Two operand operations: reg/mem&reg/mem,reg/mem&immed+regPart
for %%a in ( "add=0x00,0x80+0"  "test=0x84,0xF6+0"  "xchg=0x86"
              "or=0x08,0x80+1"
             "and=0x20,0x80+4"   "mov=0x88,0xC6+0"   "lea=0x8D"
             "sub=0x28,0x80+5"
             "xor=0x30,0x80+6"
             "cmp=0x38,0x80+7"            ) do (
   for /F "tokens=1-4 delims==,+" %%b in (%%a) do (
      set /A "TwoOperCode[%%b]=%%c, TwoOperImmed[%%b]=%%d, dest_reg[%%b]=%%e" 2> NUL
   )
)
rem Cancel errorlevel=1 from special cases
ver > NUL

rem Locate the start of the assembly code
set "_start="
for /F "delims=:" %%a in ('findstr /N ":%1" "%~F0"') do (
   if not defined _start set "_start=%%a"
)

rem Assemble the code and generate auxiliary code-blocks
del %1 "%~N1.lst" 2> NUL
set /P "=Assembling." < NUL
set "pc=10000"  // Program counter for processed input lines
set "$=256"     // ORG 100H   ;instruction pointer for object code
set "_errorCode="
for /F "usebackq skip=%_start% tokens=*" %%a in ("%~F0") do (
   if /I "%%a" equ ":%1" goto :@F
   set /A "pc+=1, pcMOD10=pc%%10"
   if !pcMOD10! equ 0 set /P "=." < NUL
   if defined .list set "[ !pc:~1! @ !$!  ]=            %%a"
   rem Assemble some instructions individually: jmp, loop's, call, ret, int, aad, aam
   call :%%a 2> NUL
   rem Assemble the rest of instructions in groups: jCond's and by number of operands
   if !errorlevel! equ 1 call :asmGroups %%a
   if errorlevel 2 set "_errorCode=!errorlevel!" & set "_errorLine=%%a" & echo/ & goto asmEnd
)
:@F
echo/

rem Fix-up forward references
set /P "=Fixing labels." < NUL
set "_errorLine="
for /F "tokens=2,3 delims=[]=" %%a in ('set fixUpNear[ 2^>NUL') do (
   set /P "=." < NUL
   if not defined %%b set "_errorCode=%_noLabel%" & set "_errVal=%%b" & echo/ & goto asmEnd
   set /A "_disp=%%b-![%%a]!"
   set "[%%a]=!_disp!"
)
for /F "tokens=2,3 delims=[]=" %%a in ('set fixUpShort[ 2^>NUL') do (
   set /P "=." < NUL
   if not defined %%b set "_errorCode=%_noLabel%" & set "_errVal=%%b" & echo/ & goto asmEnd
   for /F "tokens=1,2 delims=," %%c in ("![%%a]!") do (
      set /A "_disp=%%b-%%d, _exceed=_disp-127"
      if !_exceed! gtr 0 (
         set "_errorCode=%_farLabel%"
         set "_errVal=%%b"
         set "_errVal2=ahead"
         set "_errVal3=!_exceed!"
         echo/
         goto asmEnd
      )
      set "[%%a]=%%c,!_disp!"
   )
)
echo/

rem Create PutBytes.com auxiliary program, if not exists
if exist PutBytes.com goto :@F
setlocal DisableDelayedExpansion
set LF=^
%empty line 1/2%
%empty line 2/2%
< NUL (
   set /P "=ë0¬<"tZ^<'tV^<0rG^<9wC,0Šàë,Í!€þ,tâë4ŠðŠÔŠç€úÿ"
   setlocal EnableDelayedExpansion
   set /P "=tì€ìüëç³q€ëd·f€ïd2ä°‚‹ðü뽬<0rØ<9wÔ,0Õ!LF!"
   endlocal
   set /P "=Šàëï3ÀÍ!¬<,t£ëõŠð¬:Ætò:ÃtêŠÐŠçÍ!ëï"
) > PutBytes.com
endlocal
:@F

rem Generate the executable code from auxiliary code-blocks
set /P "=Generating object code." < NUL
set "pc=0"
(for /F "tokens=2,3 delims=@]=" %%a in ('set [') do (
   set /A "pc+=1, pcMOD20=pc%%20"
   if !pcMOD20! equ 0 set /P "=." < NUL > CON
   if "%%a" equ " Byte " (
      PutBytes %%b
   ) else if "%%a" equ " Word " (
      set "line="
      for %%c in (%%b) do (
         set /A "lowByte=(%%c&0xFF), highByte=(%%c&0xFF00)>>8"
         set "line=!line!,!lowByte!,!highByte!"
      )
      PutBytes !line:~1!
   )
)) > %1
echo/
echo File %1 created

:asmEnd
if not defined .list goto :@F
(
   echo/
   echo APA = %date% %time:~0,-3% = Assembly of %1 in "%~NX0"
   echo/
   echo/
   echo [ line @ offset]=           SOURCE LINE
   echo [ line @ type ]=VALUES OF GIVEN TYPE
   echo/
   set [
   call :checkError
   echo/
   set fixUp 2> NUL
   echo/
   set symbol 2> NUL
   echo/
   set sizeOf[ 2> NUL
   echo/
) > "%~N1.lst"
:@F

:checkError
if defined _errorCode (
   echo/
   call echo ERROR:  !errorMssg[%_errorCode%]!
   if defined _errorLine echo at line %pc:~1%: "%_errorLine%"
   exit /B 1
)
exit /B 0



======= Assemble a couple instructions individually ========

:aad
rem // Op-code of AAD
set /A "$+=2, code=0xD5, byte2=0x0A"
set "[ %pc:~1% @ Byte ]=%code%,%byte2%"
exit /B 0

:aam
rem // Op-code of AAM
set /A "$+=2, code=0xD4, byte2=0x0A"
set "[ %pc:~1% @ Byte ]=%code%,%byte2%"
exit /B 0

:aad16
rem // "Macro" equivalent to AAD with factor=16
set /A "$+=2, code=0xD5, byte2=0x0F"
set "[ %pc:~1% @ Byte ]=%code%,%byte2%"
exit /B 0

:aam16
rem // "Macro" equivalent to AAM with divisor=16
set /A "$+=2, code=0xD4, byte2=0x0F"
set "[ %pc:~1% @ Byte ]=%code%,%byte2%"
exit /B 0


======================================================
======= Assemble instructions grouped by type ========
======================================================

:asmGroups instruction
goto :noOper


The word after "@" in the code-blocks specifies the size: Byte or Word.
Each block contains a series of values of that size that will be used to generate
the object code. Only Byte size blocks may include strings. For example:

set block[ %pc% @ Byte ]=1,2,3,"String",13,10,0
set block[ %pc% @ Word ]=12345,6789,4321


======= Assemble no operand (string) instructions =======
   cld, std,  lodsB/W, stosB/W, movsB/W, cmpsB/W, scasB/W,  rep, repE/Z, repNE/NZ

:noOper code
if not defined NoOperCode[%1] goto oneOper
set /A "$+=1"
set "[ %pc:~1% @ Byte ]=!NoOperCode[%1]!"
exit /B 0


======= Assemble one operand instructions =======
   inc, dec,  not, neg,  mul, imul, div, idiv,  push, pop

:oneOper code oper
if not defined OneOperCode[%1] goto twoOper
if "%~2" equ "" exit /B %_syntax%
rem Value required in :addressingMode for PUSH immed instruction
if /I %1 equ PUSH set "dest_w=1" 
call :addressingMode %2 & if errorlevel 2 exit /B !errorlevel!
if defined immed goto pushImmed
for %%a in (PUSH POP) do if /I %1 equ %%a if %w% neq 1 (
   set "_errVal=%2" & exit /B %_notByte%
)
set /A "$+=2, code=OneOperCode[%1]|w, byte2=(mod<<6) | (reg[%1]<<3) | r_m"
set "[ %pc:~1% @ Byte ]=%code%,%byte2%"
if defined disp set /A "$+=2" & set "[ %pc:~1% @ Word ]=%disp%"
goto :@F
:pushImmed
set /A "$+=3"
set "[ %pc:~1% @ Byte ]=0x68"  // Op-code of PUSH immed16
set "[ %pc:~1% @ Word ]=%immed%"
:@F
exit /B 0


======= Assemble two operands instructions =======
   mov, lea, xchg,  add, sub,  and, or, xor,  cmp, test

:twoOper code dest,source
if not defined TwoOperCode[%1] goto jCond
if "%~3" equ "" exit /B %_syntax%
set "dest_w="
call :addressingMode %2 dest_ & if errorlevel 2 exit /B !errorlevel!
if /I "%~3" neq "OFFSET" (
   call :addressingMode %3 source_ & if errorlevel 2 exit /B !errorlevel!
) else (
   call :checkVar %4 & if errorlevel 2 exit /B !errorlevel!
   set "source_immed=!%4!"
)
if defined dest_reg (  rem twoOper reg,...
   if not defined source_immed (  rem twoOper reg,reg  or  reg,mem
      set "d=1"  //  op1=dest = reg, op2=source = mod+r_m
      rem Check special cases
      if /I %1 equ LEA (
         if defined source_reg set "_errVal=%~3" & exit /B %_notRegS%
         if "%dest_w%" equ "0" set "_errVal=%2" & exit /B %_notByte%
         set "d=0"
      ) else (
         if defined source_w if "%dest_w%" neq "%source_w%" exit /B %_badSizes%
         if /I %1 equ TEST set "d=0"
      )
      set /A "$+=2, code=TwoOperCode[%1] | (d<<1) | dest_w"
      set /A      "byte2=(source_mod<<6) | (dest_reg<<3) | source_r_m"
      set "[ %pc:~1% @ Byte ]=!code!,!byte2!"
      if defined source_disp (  rem twoOper reg,mem
         if /I %1 equ XCHG exit /B %_xchgOps%
         set /A "$+=2"
         set "[ %pc:~1% @ Word ]=%source_disp%"
      )
   ) else (  rem twoOper reg,immed
      if not defined TwoOperImmed[%1] set "_errVal=%~3" & exit /B %_notImmedS%
      set /A "$+=2, code=TwoOperImmed[%1] | dest_w"
      set /A      "byte2=(dest_mod<<6) | (dest_reg[%1]<<3) | dest_r_m"
      set "[ %pc:~1% @ Byte ]=!code!,!byte2!"
      if "%dest_w%" equ "0" (  rem Dest is Byte
         set /A "$+=1"
         set "[ %pc:~1% @ Byte ]=![ %pc:~1% @ Byte ]!,%source_immed%"
      ) else (  rem Dest is Word
         set /A "$+=2"
         set "[ %pc:~1% @ Word ]=%source_immed%"
      )
   )
) else ( rem twoOper mem,...
   if not defined source_immed ( rem twoOper mem,reg  or  mem,mem
      set "d=0"  //  op1=dest = mod+r_m, op2=source = reg
      if defined source_reg ( rem twoOper mem,reg
         if /I %1 equ LEA set "_errVal=%2" & exit /B %_regD%
         if /I %1 equ TEST exit /B %_xchgOps%
         if defined dest_w if "%dest_w%" neq "%source_w%" exit /B %_badSizes%
         set /A "$+=4, code=TwoOperCode[%1] | dest_w"
         set /A      "byte2=(dest_mod<<6) | (source_reg<<3) | dest_r_m"
         set "[ %pc:~1% @ Byte ]=!code!,!byte2!"
         set "[ %pc:~1% @ Word ]=%dest_disp%"
      ) else ( rem twoOper mem,mem
         exit /B %_notBothMem%
      )
   ) else ( rem twoOper mem,immed
      if not defined TwoOperImmed[%1] set "_errVal=%~3" & exit /B %_notImmedS%
      if not defined dest_w set "_errVal=%2" & exit /B %_noSize%
      set /A "$+=6, code=TwoOperImmed[%1] | dest_w"
      set /A      "byte2=(dest_mod<<6) | (dest_reg[%1]<<3) | dest_r_m"
      set "[ %pc:~1% @ Byte ]=!code!,!byte2!"
      if "%dest_w%" equ "0" (
         rem Dest is Byte: pad the last Word with a NOP after the 8-bits constant
         set /A "source_immed+=0x90<<8"
      )
      set "[ %pc:~1% @ Word ]=%dest_disp%,!source_immed!"
   )
)

exit /B 0


======= Assemble transfer instructions =======
   Jcond instructions by related group
   jmp, jcxz, loop, call, ret and int individually

:jCond Jcond [SHORT] label
set "cond=%1"
if /I "%cond:~0,1%" neq "J" goto asmLabel
set "cond=0"
for %%a in ( JO JNO JB   JNB JZ JNZ JBE JNBE JS JNS JP  JNP JL   JNL JLE JNLE
             _  _   JNAE JAE JE JNE JNA JA   _  _   JPE JPO JNGE JGE JNG JG
             _  _   JC   JNC ) do (
   if /I "%1" equ "%%a" set /A "cond&=0x0F" & goto :@F
   set /A cond+=1
)
goto asmLabel
:@F
shift
if /I "%1" neq "SHORT" (
   rem // Op-code of Jcond disp16 = Near
   set /A "$+=1, code=0x0F, byte2=0x80+cond"
   set "[ %pc:~1% @ Byte ]=!code!,!byte2!"
   goto jmpNearTail
) else (
   rem // Op-code of Jcond disp8 = Short
   set /A "code=0x70+cond"
   shift
   goto jmpShortTail
)

:jmp [SHORT] label
if /I "%1" neq "SHORT" (
   rem // Op-code of JMP disp16 = Near Direct
   set /A "code=0xE9"
   set "[ %pc:~1% @ Byte ]=!code!"
   goto jmpNearTail
) else (
   rem // Op-code of JMP disp8 = Short Direct
   set /A "code=0xEB"
   shift
   goto jmpShortTail
)

:call label
rem // Op-code of CALL disp16 = Near Direct
set /A "code=0xE8"
set "[ %pc:~1% @ Byte ]=%code%"

:jmpNearTail label
set /A "$+=3"
if defined %1 (
   if defined sizeOf[%1] set "_errVal=%1" & exit /B %_notData%
   set /A "disp=%1-$"
) else (
   set /A "disp=$"
   set "fixUpNear[ %pc:~1% @ Word ]=%1"
)
set "[ %pc:~1% @ Word ]=%disp%"
exit /B 0


:loop label
rem // Op-code of LOOP disp8 = Short
set /A "code=0xE2" & goto jmpShortTail
:loopE label
:loopZ label
rem // Op-code of LOOPE/Z disp8 = Short
set /A "code=0xE1" & goto jmpShortTail
:loopNE label
:loopNZ label
rem // Op-code of LOOPNE/NZ disp8 = Short
set /A "code=0xE0" & goto jmpShortTail

:jcxz label
rem // Op-code of JCXZ
set /A "code=0xE3"

:jmpShortTail label
set /A "$+=2"
if defined %1 (
   if defined sizeOf[%1] set "_errVal=%1" & exit /B %_notData%
   set /A "disp=%1-$, _exceed=-(128+disp), disp&=0xFF"
   if !_exceed! gtr 0 (
      set "_errVal=%1"
      set "_errVal2=behind"
      set "_errVal3=!_exceed!"
      exit /B %_farLabel%
   )
) else (
   set /A "disp=$"
   set "fixUpShort[ %pc:~1% @ Byte ]=%1"
)
set "[ %pc:~1% @ Byte ]=%code%,%disp%"
exit /B 0

:ret
rem // Op-code of RET = Near
set /A "$+=1, code=0xC3"
set "[ %pc:~1% @ Byte ]=%code%"
exit /B 0

:int intNum
set "intNum=%1"
if /I "%intNum:~-1%" equ "H" set /A "intNum=0x%intNum:~0,-1%"
rem // Op-code of INT
set /A "$+=2, code=0xCD"
set "[ %pc:~1% @ Byte ]=%code%,%intNum%"
exit /B 0


======= Assemble code labels and EQU, LABEL, DB and DW directives ========

:asmLabel  codeLabel:  |  constLabel EQU value  |
::         dataLabel LABEL {BYTE|WORD}  |  [dataLabel] {DB|DW} list,of,values

set "_label=%1"
if "%_label:~-1%" neq ":" goto checkEQU

:codeLabel
set "%_label:~0,-1%=%$%"
if defined .list set "[ %pc:~1% @ Label]=%$%" & set "symbol %_label:~0,-1% = %$%"
if "%~2" neq "" exit /B %_noMore%
exit /B 0

:checkEQU
if /I "%~2" neq "EQU" goto checkLABEL
if "%~3" equ %3 set _value="%~3"& goto if _value is char
set "_value=%3"
if "%_value:~0,1%%_value:~-1%" neq "''" goto else
:if _value is char
   set "_char=!_value:~1,1!"
   for /L %%i in (0,1,223) do if "!_char!" equ "!_ascii:~%%i,1!" set /A "_value=%%i+32"
   goto endif
:else
   if /I "%_value:~-1%" equ "H" set "_value=0x%_value:~0,-1%"
:endif
set /A "%_label%=%_value%"
if defined .list set "[ %pc:~1% @ Const]=!%_label%!" & set "symbol %_label% = !%_label%!"
exit /B 0

:checkLABEL
if /I "%~2" neq "LABEL" goto checkDW
set "%_label%=%$%"
if defined .list set "symbol %_label% = %$%"
if /I "%~3" equ "BYTE" set "sizeOf[%_label%]=1"
if /I "%~3" equ "WORD" set "sizeOf[%_label%]=2"
if not defined sizeOf[%_label%] set "_errVal=%3" & exit /B %_badType%
exit /B 0

:checkDW
for %%a in (DB DW) do if /I "%_label%" equ "%%a" set "_label="
if defined _label (
   set "%_label%=%$%"
   if defined .list set "symbol %_label% = %$%"
   shift
)
set "_block="
if /I "%~1" neq "DW" goto checkDB
if defined _label set "sizeOf[%_label%]=2"
:nextW
   shift
   set "_value=%~1"
   if not defined _value set "[ %pc:~1% @ Word ]=%_block:~1%" & exit /B 0
   if /I "%_value:~-1%" equ "H" set "_value=0x%_value:~0,-1%"
   set /A "_value=%_value%"
   set "_block=%_block%,%_value%"
   set /A $+=2
goto nextW

:checkDB
if /I "%~1" neq "DB" set "_errVal=%_label%" & set "_errVal2=%1" & exit /B %_notYet%
if defined _label set "sizeOf[%_label%]=1"
:nextB
   shift
   if "%~1" equ "" set "[ %pc:~1% @ Byte ]=!_block:~1!" & exit /B 0
   if "%~1" equ %1 set _value="%~1"& goto if _value is string
   set "_value=%1"
   if "%_value:~0,1%%_value:~-1%" neq "''" goto else
   :if _value is string
      set _block=!_block!,"!_value:~1,-1!"
      set "_len=0"
      for /L %%i in (5,-1,0) do (
         set /A "_newLen=_len+(1<<%%i)"
         for %%j in (!_newLen!) do if "!_value:~%%j,1!" neq "" set "_len=!_newLen!"
      )
      set /A "$+=_len-1"
      goto endif
   :else
      if /I "!_value:~-1!" equ "H" set "_value=0x!_value:~0,-1!"
      set /A "_value=(%_value%)&0xFF"
      set "_block=!_block!,%_value%"
      set /A "$+=1"
   :endif
goto nextB



======= Auxiliary subroutine that identify the addressing mode of an operand

   Parameters: operand returnPrefix
   Returns values in variables with the return prefix given and these names:
      "mod", "reg", "r_m" and "w"
      also "disp" if the operand include "var+const", "var" or "+const"
      or "immed" if the operand is *just* "const"

:addressingMode operand returnPrefix=
setlocal EnableDelayedExpansion

set "reg="
set "r_m="
set "disp="
set "immed="

rem Identify if operand is a CPU register
set /A mod=3, w=0,  i=0
for %%a in (AL CL DL BL AH CH DH BH) do (
   if /I "%~1" equ "%%a" set /A "reg=r_m=i" & goto modeOK
   set /A i+=1
)
set /A        w=1,  i=0
for %%a in (AX CX DX BX SP BP SI DI) do (
   if /I "%~1" equ "%%a" set /A "reg=r_m=i" & goto modeOK
   set /A i+=1
)
set "w="

rem Check if operand is enclosed in quotes
set "_char="
if "%~1" equ %1 set _operand=%1& set "_char=!_operand:~1,1!" & goto else_Operand_is_const

rem Check operand with this format: var[base+index]+const
set "_operand=%1"
if "%_operand:[=%" equ "%_operand%" goto checkVarConst

rem Operand have the [base+index] part
for /F "tokens=1-3 delims=[]" %%a in ("{%_operand%}") do (
   set "_var=%%a" & set "base_index=[%%b]" & set "_const=%%c"
)
set "_var=%_var:~1%" & set "_const=%_const:~0,-1%"

rem Identify the [base+index] part
set i=0
for %%a in ([BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] [BP] [BX]) do (
   if /I "%base_index%" equ "%%a" set "r_m=!i!" & goto :@F
   set /A i+=1
)
endlocal & set "_errVal=%base_index%" & exit /B %_badAddr%
:@F
if "%_var%%_const%" neq "" goto :checkDisp
rem Operand is [base+index] with no disp
if /I "%base_index%" neq "[BP]" (
   rem Standard cases
   set "mod=0"
) else (
   rem Special case: [BP] with no disp, insert a disp16=0
   set /A "disp=0, mod=2"
)
goto modeOK
:checkDisp
rem Operand is [base+index] with disp (var+const)
if defined _var call :checkVar %_var%
if errorlevel 2 endlocal & set "_errVal=%_errVal%" & exit /B %errorlevel%
set /A "disp=%_var%%_const%, mod=2"
goto modeOK

:checkVarConst
rem Operand have no base_index part: is var+const, var or const
if "!_operand:~0,1!" equ "-" goto else
for %%s in (+ -) do if "!_operand:%%s=!" neq "%_operand%" set "_sign=%%s" & goto if defined _sign
goto else
:if defined _sign
   rem Operand is var+const
   for /F "delims=%_sign%" %%a in ("%_operand%") do (
      call :checkVar %%a
      if errorlevel 2 for /F %%b in ("!_errVal!") do endlocal & set "_errVal=%%b" & exit /B !errorlevel!
   )
   set /A "disp=%_operand%, r_m=6, mod=0"
   goto endif
:else
   call :checkVar %_operand% > NUL & if errorlevel 2 goto else_Operand_is_const
   :if not errorlevel 2
      rem Operand is var
      set /A "disp=%_operand%, r_m=6, mod=0"
      goto endif
   :else_Operand_is_const
      rem If is the first operand: is wrong (excepting in PUSH immed)
      if not defined dest_w endlocal & set "_errVal=%1" & exit /B %_notImmedD%
      if "%_operand:~0,1%%_operand:~-1%" equ "''" set "_char=%_operand:~1,1%"
      if defined _char (
         for /L %%i in (0,1,223) do if "!_char!" equ "!_ascii:~%%i,1!" set /A "_operand=%%i+32"
      ) else (
         if /I "!_operand:~-1!" equ "H" set "_operand=0x!_operand:~0,-1!"
         set "_max=0xFF"
         if "%dest_w%" equ "1" set "_max=0xFFFF"
         set /A "_operand=(!_operand!)&_max"
      )
      set "immed=!_operand!"
   :endif
:endif

:modeOk
(
   endlocal
   for %%a in ("mod=%mod%" "reg=%reg%" "r_m=%r_m%" "w=%w%" "disp=%disp%" "immed=%immed%") do set "%2%%~a"
)
exit /B 0


======= Auxiliary subroutine that check if a data variable exist

:checkVar var

if "%~1" equ "" exit /B %_syntax%
if defined %1 (
   if defined sizeOf[%1] (
      set /A "w=sizeOf[%1]-1"
   ) else (
      set "_errVal=%1" & exit /B %_notVar%
   )
) else (
   set "_errVal=%1" & exit /B %_notDef%
)
exit /B 0


Of course, this program can only be used to generate .com files that will not run in 64-bits versions of Windows; however, the most important aspect of Batch assembler is that it put assembly language topics at the reach of Batch file programmers in the same way than other Batch file "chimeras" did with other languages (like JScript, VBS, PowerShell, mshta, jscript.net, etc), so interested users could do a further research on this point and even adopt some assembly practices in their Batch files (like I did with the ":@F" forward repeated label). Batch assembler can be used as an educative tool to learn assembly language basics designed for Batch file programmers (not just for you, Ed! :mrgreen: ).

I tested Batch assembler in Win XP and Win 8-32 bits. I assembled a few of my large old DOS programs and correctly generated .com files of a little less than 1 KB size. However, I did not tested all possible instruction/operands combinations, so certain specific forms may have errors. If you find a bug in Batch assembler, please report it (remember that PTR operator is not yet implemented).

As I usually do in projects like this one ("proof of concept"), the first version of this program have a very limited error checking. There are multiple situations that may crash the program, but if you write correct code you should obtain correct results (unless there is a bug!). Of course, a more extensive error checking and more features can be added (making the program larger and slower), but I think that invest more efforts in a program that can only generate .com files is just not worth it (unless new horizons be opened).

I got a lot of pleasure out of writting Batch and assembly code in the same file! I hope you may enjoy my Batch assembler program in the same way. :D

Antonio

Re: Assembly language code "in-line" for Batch files!

Posted: 22 Feb 2015 01:25
by Aacini
For those of you that could be interested, I describe here some general aspects of my Batch-assembler program. This is a list of the currently implemented features:

  • The words in this assembler are not case sensitive.
  • Only EQU, DB, DW and LABEL directives are implemented.
  • EQU directive can only define a numeric constant.
  • Valid types for LABEL directive are BYTE and WORD.
  • All variables must be defined in DB, DW or LABEL directives before used.
  • The only implemented operator is OFFSET of a variable. There is no DUP() for DB/DW directives.
  • A numeric constant may have these formats:
    • A decimal number.
    • An hexadecimal number that start in digit and end in H, like 0ABH.
    • A single Ascii character enclosed in quotes or apostrophes. Because the simplified parser method used, the usual problematic characters can not be included: exclamation marks are removed and percent signs needs to be inserted four times: mov al,"%%%%"; you may use their decimal values instead (!=33, %=37). Of course, no one Batch separator/special character can be enclosed in apostrophes, including space and quote ("=34).
  • DB directive also allows strings enclosed in quotes or apostrophes of up to 63 characters long, with the same previous restrictions.
  • DW directive don't allows character contants, just numbers. You may use quotes in DW to enclose expressions with special Batch characters as described below.

If you want to include an expression instead of a numeric constant in EQU, DB or DW directives, it must have the format used in Batch SET /A command; this is the only incompatibility vs. standard assembly code. Expressions are not allowed in instruction operands. Some examples:

Code: Select all

upCaseA         EQU     "A"             ;ok, makes upCaseA=65
lowCasea        EQU     'a'             ;ok, makes lowCasea=97
twoLetters      EQU     "AB"            ;wrong, makes twoLetters=65 with NO warning!

conversion      EQU     'a'-'A'         ;not valid in this assembler
conversion      EQU     lowCasea-upCaseA;ok, makes conversion=32

mask            DW      8041H           ;ok
mask            DW      80H SHL 8 OR 41H;not valid in this assembler
mask            DW      "0x80<<8|0x41"  ;ok. NOT valid in standard assemblers!

This is a list of the 80286 16-bits CPU instructions implemented in the first version of Batch-assembler.

Code: Select all

One operand:    inc, dec; not, neg; mul, imul, div, idiv; push, pop.
Two operands:   mov, lea, xchg; add, sub; and, or, xor; cmp, test.
Transfer:       jmp and jCond's including SHORT operator, jcxz, loop's, call, ret, int.
Strings:        cld, std; lods, stos, movs, cmps and scas with B/W; rep's.
Various:        aad, aam, aad16, aam16.

Instruction operands must have one of the following formats:

Code: Select all

AL AH BL BH CL CH DL DH                 AX BX CX DX SI DI BP SP
var+const              var              const
var[BX]+const          var[BX]          [BX]+const          [BX]
var[BP]+const          var[BP]          [BP]+const          [BP]
var[SI]+const          var[SI]          [SI]+const          [SI]
var[DI]+const          var[DI]          [DI]+const          [DI]
var[BX+SI]+const       var[BX+SI]       [BX+SI]+const       [BX+SI]
var[BX+DI]+const       var[BX+DI]       [BX+DI]+const       [BX+DI]
var[BP+SI]+const       var[BP+SI]       [BP+SI]+const       [BP+SI]
var[BP+DI]+const       var[BP+DI]       [BP+DI]+const       [BP+DI]

Comments in instructions not really start at semicolon. Just the required operands of each instruction are processed and the rest of the line is ignored (semicolons are stripped because they are Batch delimiters). You may also separate operands by spaces (or semicolons or equal-signs), but you can NOT include spaces in an operand. For the same reason, you can not include comments in DB nor DW directives, and nothing at all after a code label. Here are some examples:

Code: Select all

                                        ;no comment in line below:
letter  DB      "_ABCDEFGHIJKLMNOPQRSTUVWXYZ"

        mov     al, letter              ;load "_" in AL
        mov     al  letter              ;load "_" in AL (not valid in standard assemblers)
        mov     al, letter + 3          ;load "_" in AL (" + 3" is ignored)
        mov     al, letter+3            ;load "C" in AL
        mov     al, letter[3]           ;not valid in this assembler

        mov     bx, 3                   ;BX = 3
        mov     al, letter[bx]          ;load "C" in AL
        mov     al, letter+bx           ;not valid in this assembler

        mov     bx, OFFSET letter       ;BX -> letter
        mov     al, [bx]+9              ;load "I" in AL
        mov     al, [bx+9]              ;not valid in this assembler

        mov     si, 9                   ;SI = 9
        mov     al, [bx+si]             ;load "I" in AL
        mov     al, [si+bx]             ;not valid in this assembler

        mov     al, [bx+si]+6           ;load "O" in Al
        mov     al, [bx+si+6]           ;not valid in this assembler
        mov     al, [bx+si]-6           ;load "C" in AL

Note that assembly variables and labels are stored in Batch variables with same names. This means that you may use Batch variables defined before the assembly code as if they were defined via EQU directives. However, if the assembly code use a not defined variable or label and there is a Batch variable with same name, the program will fail.

The program will fail if there is a Batch label with the same name of an assembly instruction, or if you write an assembly instruction with the same name of an existent Batch label. The program will fail if the assembly code use any label/variable with the same name of a Batch variable used in the assembler. I tried to insert an underscore as prefix for all variables used in the assembler so you are save if your assembly identifiers don't use an underscore prefix, but there are a couple Batch variables that still don't have it.

Unlike other Batch programs that generate binary bytes, I choose a .com program to create the binary values in this case. Of course, this method can be changed by the well-known pure Batch file solution or the fast JScript/VBS one; however, I think that it is rather logical to use a .com file in a program designed to create .com files. After all, it has no much sense to create a .com file in a computer that can not execute it, isn't it?

The program that generate the binary bytes is PutBytes.com. This file is comprised of just printable Ascii characters, so it may be directly created by ECHO commands placed in the Batch file. The way to avoid control characters in the executable code is via a series of simple tricks, like create numbers less than 32 via the subtraction of two larger numbers, avoid to use ADD and OR instructions (because their codes are 0 and 8 respectively), use SHORT jump trasfers only or rearrange certain parts of the code in order to avoid jumps to too close forward labels. Here it is:

Code: Select all

        ;PutBytes.asm: Print a series of comma-separated bytes given as numbers 0-255
        ;              or as strings enclosed in quotes or apostrophes
        ;              for example: PutBytes "Hello",13,10,'World',33,13,10

        ;Antonio Perez Ayala - 2015/02/18


        jmp     SHORT initialize

QUOTE   EQU     34                      ;Ascii code of quote

nextArg:
        lodsb                           ;AL = [SI++]
        cmp     al, QUOTE               ;is quote?
        je      SHORT isString          ;yes: is string
        cmp     al, "'"                 ;is apostrophe?
        je      SHORT isString          ;yes: is string
        ;
        cmp     al, "0"                 ;below "0"?
        jb      SHORT progEnd           ;yes: terminate
        cmp     al, "9"                 ;above "9"?
        ja      SHORT progEnd           ;yes: terminate
        ;                               ;else: is number
        sub     al, "0"                 ;convert first digit to binary
        mov     ah, al                  ;AH = first digit
        jmp     SHORT nextDigit

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

showByte:
        int     21H                     ;print the byte
        ;
        cmp     dh, ","                 ;another argument?
        je      SHORT nextArg           ;yes: go back for it
        jmp     SHORT progEnd           ;else: terminate
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
numberEnd:
        mov     dh, al                  ;DH = last char
        mov     dl, ah                  ;DL = byte value
        mov     ah, bh                  ;AH = VIDEO_OUTPUT function (2)
        ;
        cmp     dl, 255                 ;byte is 0FFH?
        je      SHORT showByte          ;yes: use function 2
        sub     ah, -4                  ;else: use function 6 (DIRECT_CONSOLE_IO)
        jmp     SHORT showByte

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

initialize:
        mov     bl, 113                 ;BL = 113
        sub     bl, 100                 ;    -100 = 13 (CR)
        mov     bh, 102                 ;BH = 102
        sub     bh, 100                 ;    -100 = 2 (function: VIDEO_OUTPUT)
        xor     ah, ah                  ;AH = 0
        mov     al, 82H                 ;AX = 82H
        mov     si, ax                  ;SI -> arguments
        cld                             ;to increment SI index
        jmp     SHORT nextArg           ;and jump to start the process

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

nextDigit:
        lodsb                           ;AL = [SI++]
        cmp     al, "0"                 ;below "0"?
        jb      SHORT numberEnd         ;yes: number end
        cmp     al, "9"                 ;above "9"?
        ja      SHORT numberEnd         ;yes: number end
        ;
        sub     al, "0"                 ;AL = thisDigit
        aad                             ;AL = previousNumber*10+thisDigit
        mov     ah, al                  ;AH = newNumber
        jmp     SHORT nextDigit

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

progEnd:
        xor     ax,ax                   ;AL = 0 (errorlevel), AH = 0 (terminate)
        int     21H                     ;terminate program

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

stringEnd:
        lodsb                           ;AL = [SI++]
        cmp     al, ","                 ;another argument?
        je      SHORT nextArg           ;yes: go back
        jmp     SHORT progEnd           ;else: terminate

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

isString:
        mov     dh, al                  ;DH = delimiter
        ;
nextChar:
        lodsb                           ;AL = [SI++]
        cmp     al, dh                  ;string ends?
        je      SHORT stringEnd         ;yes: jump
        cmp     al, bl                  ;argument ends?
        je      SHORT progEnd           ;yes: terminate
        ;
        mov     dl, al                  ;DL = this char
        mov     ah, bh                  ;AH = VIDEO_OUTPUT function
        int     21H                     ;print the byte
        jmp     SHORT nextChar


PutBytes.asm program is also a good example of the type of applications that can be created with Batch assembler. You may assemble it and you will obtain a fully working program equivalent to the one used in Batch assembler. If you do so, you will note that the two .com files are not identical; this is because Batch assembler don't use the simplified forms of certain instructions that works on AL/AX registers, like standard assemblers does.

Antonio

Re: Assembly language code "in-line" for Batch files!

Posted: 23 Feb 2015 10:56
by miskox
@Aacini: this is great!

I did some assembler programming (Z80A (ZX Spectrum) and SC61860 (some SHARP Pocket Computers). But I have very little x86 experience. It would be nice if you could do some 'tutorials' or something like that (if allowed by the Admins - because this is a DOS forum unless some more users find it helpful).

I am trying to do some work again in x86 assembler (disassemling at this time to be correct) so this example.com gives some additional information I need. Thanks!

Of course I will study your BatchAssembler when time permits.

Saso