HASHSUM.BAT v1.8 - emulate md5sum, shasum, and the like

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
mmetts
Posts: 1
Joined: 01 Nov 2019 15:24

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#31 Post by mmetts » 01 Nov 2019 15:36

Hello, I love this script! Thanks for writing it. Recently, I got this error:
--------------------------------------------
C:\Temp>hashsum /C md5sums.md5
---------- "md5sums.md5" ----------
find: '/i': No such file or directory
find: '*.md5*': No such file or directory
*FAILED: LighthouseStudio.log
========== SUMMARY ==========
Total manifests = 1
Matched files = 0
Failed files = 1
--------------------------------------------
And, what I discovered was that the particular Windows system I was using has another `find` executable upstream in the path. In essence, your script was trying to run the Unix/Linux version of `find`. My thinking here is that the script should work on any Windows system that's reasonably healthy regardless of how the path variable has been manipulated. This is an important point because my audience likely does not have control over the path environment variable ... in fact, likely they'd have no idea what it is or what is does.

So what I did was I replaced "find " with "%WINDIR%\system32\find.exe " everywhere in the script and that seems to have done the trick.

I noted calls to other non-built-in commands (i.e. - not things like ECHO) and none of them seemed likely to me to collide with something spawned in a different ocean. In any case, I offer this for what it's worth... :) Please consider incorporating it if you will.

Thanks again for writing this script!

Best,
Mike

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#32 Post by aGerman » 01 Nov 2019 16:12

Most likely you corrupted your PATH environment variable.

Steffen

pieh-ejdsch
Posts: 240
Joined: 04 Mar 2014 11:14
Location: germany

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#33 Post by pieh-ejdsch » 03 Nov 2019 15:38

In the beginning of your script you only need to insert a single line to use the Variable Path correctly. therefore they do not have to introduce paths to any find.exe.

Code: Select all

call set  path=%%WINDIR%%\system32;%%path%%

andresp
Posts: 1
Joined: 30 Apr 2020 12:27

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#34 Post by andresp » 30 Apr 2020 12:59

Hello,

First of all, I want to thank you for making this tool available to everyone

I have had problems with Chinese file names (unicode UTF-8) and with few changes to the code I have been able to make it work with no apparent problems

Quickly tested with these commands from a directory with weird filenames and subdirectorynames:

hashsum_unicode.bat /s /a md5 "*.*" > ..\pp.md5
hashsum_unicode.bat /c ..\pp.md5

I will copy below the final code. Hope this helps someone
Greetings

Code: Select all

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
@goto :Batch

::::
::::HASHSUM.BAT history
::::
::::  v1.6 2019-02-26 - Modify /F and /FR to support non-ASCII characters
::::  v1.5 2018-02-18 - Added /H, /F, /FR and /NH options.
::::  v1.4 2016-12-26 - Convert /A value to upper case because some Windows
::::                    versions are case sensitive. Also improve JScript file
::::                    read performance by reading 1000000 bytes instead of 1.
::::  v1.3 2016-12-17 - Bug fixes: Eliminate unwanted \r\n from temp file by
::::                    reading stdin with JScript instead of FINDSTR.
::::                    Fix help to ignore history.
::::  v1.2 2016-12-07 - Bug fixes: Exclude FORFILES directories and
::::                    correct setlocal/endlocal management in :getOptions
::::  v1.1 2016-12-06 - New /V option, and minor bug fixes.
::::  v1.0 2016-12-05 - Original release
:::
:::HASHSUM  [/Option [Value]]... [File]...
:::
:::  Print or check file hashes using any of the following standard
:::  hash algorithms: MD5, SHA1, SHA256, SHA384, or SHA512.
:::
:::  HASHSUM always does a binary read - \r\n is never converted to \n.
:::
:::  In the absence of /C, HASHSUM computes the hash for each File, and writes
:::  a manifest of the results. Each line of output consists of the hash value,
:::  followed by a space and an asterisk, followed by the File name. The default
:::  hash alogrithm is sha256. File may include wildcards, but must not contain
:::  any path information.
:::
:::  If File is not given, then read from standard input and write the hash
:::  value only, without the trailing space, asterisk, or file name.
:::
:::  Options:
:::
:::    /? - Prints this help information to standard output.
:::
:::    /?? - Prints paged help using MORE.
:::
:::    /V - Prints the HASHSUM.BAT version.
:::
:::    /H - Prints the HASHSUM.BAT history.
:::
:::    /A Algorithm
:::
:::         Specifies one of the following hash algorithms:
:::         MD5, SHA1, SHA256, SHA384, SHA512
:::
:::    /P RootPath
:::
:::         Specifies the root path for operations.
:::         The default is the current directory.
:::
:::    /S - Recurse into all Subdirectories. The relative path from the root
:::         is included in the file name output.
:::         This option is ignored if used with /C.
:::
:::    /I - Include the RootPath in the file name output.
:::         This option is ignored if used with /C.
:::
:::    /T - Writes a space before each file name, rather than an
:::         asterisk. However, files are still read in binary mode.
:::         This option is ignored if used with /C.
:::
:::    /C - Read hash values and file names from File (the manifest), and verify
:::         that local files match. File may include path information with /C.
:::
:::         If File is not given, then read hash and file names from standard
:::         input. Each line of input must have a hash, followed by two spaces,
:::         or a space and an asterisk, followed by a file name.
:::
:::         If /A is not specified, then the algorithm is determined by the
:::         File extension. If the extension is not a valid algorithm, then
:::         the algorithm is derived based on the length of the first hash
:::         within File.
:::
:::         Returns ERRORLEVEL 1 if any manifest File is not found or is invalid,
:::         or if any local file is missing or does not match the hash value in
:::         the manifest. If all files are found and match, then returns 0.
:::
:::    /F FileName
:::
:::         When using /C, only check lines within the manifest that contain the
:::         string FileName. The search ignores case.
:::
:::    /FR FileRegEx
:::
:::         When using /C, only check lines within the manifest that match the
:::         FINDSTR regular expression FileRegEx. The search ignores case.
:::
:::    /NH - (No Headers)  Suppresses listing of manifest name(s) when using /C.
:::
:::    /NE - (No Errors) Suppresses error messages when using /C.
:::
:::    /NM - (No Matches) Suppresses listing of matching files when using /C.
:::
:::    /NS - (No Summary) Suppresses summary information when using /C.
:::
:::    /Q  - (Quiet) Suppresses all output when using /C.
:::
:::HASHSUM.BAT version 1.6 was written by Dave Benham
:::maintained at http://www.dostips.com/forum/viewtopic.php?f=3&t=7592

============= :Batch portion ===========
@echo off
setlocal disableDelayedExpansion
chcp 65001>NUL


:: Define options
set "options= /A:"" /C: /I: /P:"" /S: /T: /?: /??: /NH: /NE: /NM: /NS: /Q: /V: /H: /F:"" /FR:"" "

:: Set default option values
for %%O in (%options%) do for /f "tokens=1,* delims=:" %%A in ("%%O") do set "%%A=%%~B"
set "/?="
set "/??="

:getOptions
if not "%~1"=="" (
  set "test=%~1"
  setlocal enableDelayedExpansion
  if "!test:~0,1!" neq "/" endlocal & goto :endOptions
  set "test=!options:*%~1:=! "
  if "!test!"=="!options! " (
      endlocal
      >&2 echo Invalid option %~1
      exit /b 1
  ) else if "!test:~0,1!"==" " (
      endlocal
      set "%~1=1"
  ) else (
      endlocal
      set "%~1=%~2"
      shift /1
  )
  shift /1
  goto :getOptions
)
:endOptions

:: Display paged help
if defined /?? (
  (for /f "delims=: tokens=*" %%A in ('findstr "^:::[^:] ^:::$" "%~f0"') do @echo(%%A)|more /e
  exit /b 0
) 2>nul

:: Display help
if defined /? (
  for /f "delims=: tokens=*" %%A in ('findstr "^:::[^:] ^:::$" "%~f0"') do echo(%%A
  exit /b 0
)

:: Display version
if defined /V (
  for /f "delims=: tokens=*" %%A in ('findstr /ric:"^:::%~nx0 version" "%~f0"') do echo(%%A
  exit /b 0
)

:: Display history
if defined /H (
  for /f "delims=: tokens=*" %%A in ('findstr "^::::" "%~f0"') do echo(%%A
  exit /b 0
)

:: If no file specified, then read stdin and write to a temp file
set "tempFile="
if "%~1" equ "" set "tempFile=%~nx0.%time::=_%.%random%.tmp"
if defined tempFile cscript //nologo //E:JScript "%~f0" "%temp%\%tempFile%"

if defined /P cd /d "%/P%" || exit /b 1
if defined /C goto :check

:generate
if defined tempFile cd /d "%temp%"
if not defined /A set "/A=sha256"
if defined /S set "/S=/s"
if defined /T (set "/T= ") else set "/T=*"
call :defineEmpty
if not defined /P goto :generateLoop
if not defined /I goto :generateLoop
if "%/P:~-1%" equ "\" (set "/I=%/P:\=/%") else set "/I=%/P:\=/%/"
set "rtn=0"

:generateLoop
(
  for /f "delims=" %%F in (
    'forfiles %/s% /m "%tempFile%%~1" /c "cmd  /c if @isdir==FALSE echo @relpath" 2^>nul'
  ) do for /f "delims=" %%A in (
    'certutil.exe -hashfile %%F %/A% ^| findstr /v ":" ^|^| if %%~zF gtr 0 (echo X^) else echo %empty%'
  ) do (
    set "file=%%~F"
    set "hash=%%A"
    setlocal enableDelayedExpansion
    set "file=!file:~2!"
    
	REM echo "debug: filefull=%%F"
	REM echo "debug: file=%%F"
	REM echo "debug: hash=%%A"
	if defined tempFile (
      if !hash! equ X (
        set "rtn=1"
        echo ERROR
      ) else echo !hash: =!
    ) else (
      if !hash! equ X (
        set "rtn=1"
        echo ERROR: !/I!!file!
      ) else echo !hash: =! !/T!!/I!!file:\=/!
    )
    endlocal
  )
) || (
  set "rtn=1"
  echo MISSING: %/T%%1
)
shift /1
if "%~1" neq "" goto :generateLoop
if defined tempFile del "%tempFile%"
exit /b %rtn%

:check
if defined /Q for %%V in (/NE /NM /NS /NH) do set "%%V=1"
if defined /F if defined /FR (
  >&2 echo ERROR: /F and /FR cannot be combined
  exit /b 1
)
set "searchTemp="
if defined /F (
  set "searchTemp=%temp%\%~nx0.%time::=_%.%random%.search.tmp"
  setlocal enableDelayedExpansion
  (echo(!/F!) > "!%searchTemp!"
  endlocal
  set "file="    & set "freg=rem" & set "norm=rem"
) else if defined /FR (
  set "searchTemp=%temp%\%~nx0.%time::=_%.%random%.search.tmp"
  setlocal enableDelayedExpansion
  (echo(!/FR!) > "!%searchTemp!"
  endlocal
  set "file=rem" & set "freg="    & set "norm=rem"
) else (
  set "file=rem" & set "freg=rem" & set "norm="
)
set /a manifestCnt=missingManifestCnt=invalidCnt=missingCnt=failCnt=okCnt=0

:checkLoop
set "alogorithm=%/A%"
if defined tempFile set "tempFile=%temp%\%tempFile%"
for %%F in ("%tempFile%%~1") do call :checkFile "%%~F"
if defined tempFile del "%tempFile%"
shift /1
if "%~1" neq "" goto :checkLoop

if defined searchTemp del "%searchTemp%"

if not defined /NS (
  echo ==========  SUMMARY  ==========
  echo Total manifests   = %manifestCnt%
  echo Matched files     = %okCnt%
  echo(
  if %missingManifestCnt% gtr 0 echo Missing manifests = %missingManifestCnt%
  if %invalidCnt% gtr 0         echo Invalid manifests = %invalidCnt%
  if %missingCnt% gtr 0         echo Missing files     = %missingCnt%
  if %failCnt% gtr 0            echo Failed files      = %failCnt%

  
)
set /a "1/(missingManifestCnt+invalidCnt+missingCnt+failCnt)" 2>nul && (
  echo(
  exit /b 1
)
exit /b 0

:checkFile

set /a manifestCnt+=1
if not defined /NH if defined tempfile (echo ----------  ^<stdin^>  ----------) else echo ----------  %1  ----------
if not defined algorithm set "/A="
if not defined /A echo *.md5*.sha1*.sha256*.sha384*.sha512*|find /i "*%~x1*" >nul && for /f "delims=." %%A in ("%~x1") do set "/A=%%A"
findstr /virc:"^[0123456789abcdef][0123456789abcdef]* [ *][^ *?|<>]" %1 >nul 2>nul && (
  if not defined /NE if defined tempFile (echo *INVALID: ^<stdin^>) else echo *INVALID: %1
  set /a invalidCnt+=1
  exit /b
)
(
  %norm% for /f "usebackq tokens=1* delims=* " %%A in (%1) do (
  %file% for /f "tokens=1* delims=* " %%A in ('type %1 ^| findstr /ilg:"%searchTemp%"') do (
  %freg% for /f "tokens=1* delims=* " %%A in ('type %1 ^| findstr /irg:"%searchTemp%"') do (
    set "hash0=%%A"
    set "fileName=%%B"
    if defined /A (call :defineEmpty) else call :determineFormat
    setlocal enableDelayedExpansion
    set "fileName=!fileName:/=\!"
    for /f "tokens=1* delims=" %%C in (
      'certutil.exe -hashfile "!fileName!" !/A! ^| findstr /v ":" ^|^| if exist "!fileName!" (echo !empty!^) else echo X'
    ) do (
	set "hash=%%C"
	)
    if /i "!hash0!" equ "!hash: =!" (
      if not defined /NM echo OK: !fileName!
      endlocal
      set /a okCnt+=1
    ) else if !hash! equ X (
      if not defined /NE echo *MISSING: !fileName!
      endlocal
      set /a missingCnt+=1
    ) else (
      if not defined /NE echo *FAILED: !fileName!
      endlocal
      set /a failCnt+=1
    )
  )
) 2>nul || if not defined /F if not defined /FR (
  if not defined /NE echo *MISSING: %1
  set /a missingManifestCnt+=1
)
exit /b

:determineFormat
if "%hash0:~127%" neq "" (
  set "/A=SHA512"
) else if "%hash0:~95%" neq "" (
  set "/A=SHA384"
) else if "%hash0:~63%" neq "" (
  set "/A=SHA256"
) else if "%hash0:~39%" neq "" (
  set "/A=SHA1"
) else set "/A=MD5"

:defineEmpty
if /i "%/A%"=="md5" (
  set "empty=d41d8cd98f00b204e9800998ecf8427e"
  set "/A=MD5"
) else if /i "%/A%"=="sha1" (
  set "empty=da39a3ee5e6b4b0d3255bfef95601890afd80709"
  set "/A=SHA1"
) else if /i "%/A%"=="sha256" (
  set "empty=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
  set "/A=SHA256"
) else if /i "%/A%"=="sha384" (
  set "empty=38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b"
  set "/A=SHA384"
) else if /i "%/A%"=="sha512" (
  set "empty=cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"
  set "/A=SHA512"
) else (
  echo ERROR: Invalid /A algorithm>&2
  (goto) 2>nul&exit /b 1
)
exit /b


************* JScript portion **********/
var fso = new ActiveXObject("Scripting.FileSystemObject");
var out = fso.OpenTextFile(WScript.Arguments(0),2,true);
var chr;
while( !WScript.StdIn.AtEndOfStream ) {
  chr=WScript.StdIn.Read(1000000);
  out.Write(chr);
}

jfl
Posts: 226
Joined: 26 Oct 2012 06:40
Location: Saint Hilaire du Touvet, France
Contact:

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#35 Post by jfl » 01 May 2020 08:03

Is your console using code page 65001?
This script uses sub-commands and pipes internally. Any text that is piped from one application to another, or passed as an argument to a sub-command, is converted to the console code page for that.
So if your code page does not include Chinese characters, they will be lost in the process.
Code page 65001 is the only one that preserves all Unicode characters.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#36 Post by dbenham » 01 May 2020 23:22

jfl wrote:
01 May 2020 08:03
Any text that is piped from one application to another, or passed as an argument to a sub-command, is converted to the console code page for that.
That simply is not true - pipes do not in and of themselves do any type of transformation. They can handle binary data just fine. Beyond that, the script does not use pipes to transfer file content. But the utility does have an option to read from stdin, so you can pipe data into hashsum if you desire.

File system names and command line text can indeed have issues with the code page, depending on the characters that are needed. I was careful to make sure that hashsum works with any binary content, but I did not do any coding to account for characters in filenames that might be out of scope for the code page.

Andresp did indeed modify the code to utilize code page 65001, so I can see how it could work better with international file names. But the changes are not needed if all filenames are valid with the current code page.


Dave Benham

jfl
Posts: 226
Joined: 26 Oct 2012 06:40
Location: Saint Hilaire du Touvet, France
Contact:

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#37 Post by jfl » 02 May 2020 08:38

Hi Dave
dbenham wrote:
01 May 2020 23:22
pipes do not in and of themselves do any type of transformation.
You're splitting hairs! OK, it's not actually the pipe that does the transformation, it's the batch interpreter that does it before passing the data to the pipe.
In practice, the effect is the same: Any pipe in a batch involves conversion of the UTF-16 Unicode strings used internally by cmd.exe to 8-bit strings encoded in the current console code page.

Code: Select all

C:\JFL\Proj\Non-ASCII>chcp
Active code page: 437

C:\JFL\Proj\Non-ASCII>dir
 Volume in drive C has no label.
 Volume Serial Number is B4F9-E8AF

 Directory of C:\JFL\Proj\Non-ASCII

2018-02-12  17:37    <DIR>          .
2018-02-12  17:37    <DIR>          ..
2017-04-13  18:07    <DIR>          Arabic العربية
2017-04-13  18:07    <DIR>          Chinese 中文
2017-08-17  14:34    <DIR>          Czech Čeština
2017-08-17  14:35    <DIR>          Español Spanish
2017-03-22  23:18    <DIR>          French Français
2017-03-15  21:00    <DIR>          German Deutsch
2017-08-17  14:36    <DIR>          Greek Ελληνικά
2017-04-13  18:08    <DIR>          Hebrew עִבְרִית
2017-08-17  14:37    <DIR>          Hindi हिन्दी
2017-04-13  18:09    <DIR>          Japanese 日本語
2017-08-17  14:37    <DIR>          Korean 한국어
2017-04-13  18:06    <DIR>          Russian Русский
2017-08-18  18:09    <DIR>          Thai ภาษาไทย
2017-03-15  17:13               992 ansi.txt
2017-03-12  13:51               105 README.txt
2018-02-12  17:37             4,937 test.tar.gz
2017-03-15  15:37             1,986 utf16.txt
2017-03-15  17:33             1,251 utf7.txt
2017-03-15  13:14             1,063 utf8.txt
               6 File(s)         10,334 bytes
              15 Dir(s)  336,079,147,008 bytes free

C:\JFL\Proj\Non-ASCII> dir | more
 Volume in drive C has no label.
 Volume Serial Number is B4F9-E8AF

 Directory of C:\JFL\Proj\Non-ASCII

2018-02-12  17:37    <DIR>          .
2018-02-12  17:37    <DIR>          ..
2017-04-13  18:07    <DIR>          Arabic ???????
2017-04-13  18:07    <DIR>          Chinese ??
2017-08-17  14:34    <DIR>          Czech Cestina
2017-08-17  14:35    <DIR>          Español Spanish
2017-03-22  23:18    <DIR>          French Français
2017-03-15  21:00    <DIR>          German Deutsch
2017-08-17  14:36    <DIR>          Greek ε???????
2017-04-13  18:08    <DIR>          Hebrew ????????
2017-08-17  14:37    <DIR>          Hindi ??????
2017-04-13  18:09    <DIR>          Japanese ???
2017-08-17  14:37    <DIR>          Korean ???
2017-04-13  18:06    <DIR>          Russian ???????
2017-08-18  18:09    <DIR>          Thai ???????
2017-03-15  17:13               992 ansi.txt
2017-03-12  13:51               105 README.txt
2018-02-12  17:37             4,937 test.tar.gz
2017-03-15  15:37             1,986 utf16.txt
2017-03-15  17:33             1,251 utf7.txt
2017-03-15  13:14             1,063 utf8.txt
               6 File(s)         10,334 bytes
              15 Dir(s)  336,079,147,008 bytes free


C:\JFL\Proj\Non-ASCII>chcp 65001
Active code page: 65001

C:\JFL\Proj\Non-ASCII>dir
 Volume in drive C has no label.
 Volume Serial Number is B4F9-E8AF

 Directory of C:\JFL\Proj\Non-ASCII

2018-02-12  17:37    <DIR>          .
2018-02-12  17:37    <DIR>          ..
2017-04-13  18:07    <DIR>          Arabic العربية
2017-04-13  18:07    <DIR>          Chinese 中文
2017-08-17  14:34    <DIR>          Czech Čeština
2017-08-17  14:35    <DIR>          Español Spanish
2017-03-22  23:18    <DIR>          French Français
2017-03-15  21:00    <DIR>          German Deutsch
2017-08-17  14:36    <DIR>          Greek Ελληνικά
2017-04-13  18:08    <DIR>          Hebrew עִבְרִית
2017-08-17  14:37    <DIR>          Hindi हिन्दी
2017-04-13  18:09    <DIR>          Japanese 日本語
2017-08-17  14:37    <DIR>          Korean 한국어
2017-04-13  18:06    <DIR>          Russian Русский
2017-08-18  18:09    <DIR>          Thai ภาษาไทย
2017-03-15  17:13               992 ansi.txt
2017-03-12  13:51               105 README.txt
2018-02-12  17:37             4,937 test.tar.gz
2017-03-15  15:37             1,986 utf16.txt
2017-03-15  17:33             1,251 utf7.txt
2017-03-15  13:14             1,063 utf8.txt
               6 File(s)         10,334 bytes
              15 Dir(s)  336,078,675,968 bytes free

C:\JFL\Proj\Non-ASCII>dir | more
 Volume in drive C has no label.
 Volume Serial Number is B4F9-E8AF

 Directory of C:\JFL\Proj\Non-ASCII

2018-02-12  17:37    <DIR>          .
2018-02-12  17:37    <DIR>          ..
2017-04-13  18:07    <DIR>          Arabic العربية
2017-04-13  18:07    <DIR>          Chinese 中文
2017-08-17  14:34    <DIR>          Czech Čeština
2017-08-17  14:35    <DIR>          Español Spanish
2017-03-22  23:18    <DIR>          French Français
2017-03-15  21:00    <DIR>          German Deutsch
2017-08-17  14:36    <DIR>          Greek Ελληνικά
2017-04-13  18:08    <DIR>          Hebrew עִבְרִית
2017-08-17  14:37    <DIR>          Hindi हिन्दी
2017-04-13  18:09    <DIR>          Japanese 日本語
2017-08-17  14:37    <DIR>          Korean 한국어
2017-04-13  18:06    <DIR>          Russian Русский
2017-08-18  18:09    <DIR>          Thai ภาษาไทย
2017-03-15  17:13               992 ansi.txt
2017-03-12  13:51               105 README.txt
2018-02-12  17:37             4,937 test.tar.gz
2017-03-15  15:37             1,986 utf16.txt
2017-03-15  17:33             1,251 utf7.txt
2017-03-15  13:14             1,063 utf8.txt
               6 File(s)         10,334 bytes
              15 Dir(s)  336,078,675,968 bytes free


C:\JFL\Proj\Non-ASCII>
Exactly the same happens for argument strings passed by batch to sub-commands.

The only case where there is no conversion is in pipes between two external sub-commands.
Ex: myprog1.exe | myprog2.exe
But even in this case, the standard input and arguments of myprog1.exe ARE converted by cmd to the console code page, and the output of myprog2.exe to the console IS converted from the code page to UTF16. Which trashes all characters that are not available in that console code page. And de-facto the input of myprog2.exe IS encoded in the current console code page.

I'm very sensitive to all that because most third-party command-line programs mutilate my first name when displaying my home directory name in the Windows console, even more so when their output goes through a pipe. And I hate that. :evil:
To avoid that, programs that want to maximize their chance of correctly displaying characters in the user's language (French for me, but it would be the same for Chinese on a Chinese version of Windows) must assume that their console AND piped input and/or output are encoded in the current console code page. (Which is unfortunately never 65001 by default.)
This is what my SysToolsLib programs do, and they display French characters (for me) in file names correctly in all cases, whatever the console code page, and whether in pipes or not. All known ports of Unix tools to Windows fail miserably at one or the other, and usually at both.

Jean-François

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#38 Post by dbenham » 02 May 2020 09:07

Thanks for the clarification. Yes, all internal command IO as well as any command line text is subject to transformation. But are you sure about the following:
jfl wrote:...the standard input and arguments of myprog1.exe ARE converted by cmd to the console code page
Arguments, absolutely, but surely an external command can read a binary file redirected to stdin.

Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#39 Post by dbenham » 02 May 2020 15:16

@jfl - I see why you changed to code page 65001, though I think the original code page should be saved before changing, and then restored at the end.

But why did you substitute FINDSTR for FIND in two places? Is that really necessary? I'm thinking my original code used FIND for a reason, though I can't remember what it might have been.


Dave Benham

jfl
Posts: 226
Joined: 26 Oct 2012 06:40
Location: Saint Hilaire du Touvet, France
Contact:

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#40 Post by jfl » 03 May 2020 07:11

dbenham wrote:
02 May 2020 09:07
surely an external command can read a binary file redirected to stdin.
This is deviating way off the main topic of this thread, but yes, this is a problem.
(Here we're talking about programs that eat/produce/filter text, not binary data like sound or images.)
I tried many things, then ended up using this heuristic. It's a bit complex, but it works very well in practice:
  • Use UTF-8 internally for everything. (UTF-16 or UTF-32 would work just as well, but they're less convenient to use in C.)
  • Convert the command line from the current console code page to UTF-8 before testing the arguments.
  • If stdin is the console or a pipe, assume it's encoded either in the current console code page (CP 437 by default on USA Windows, 850 on French Windows), or in UTF-8.
  • Else stdin is a file. Assume it's encoded either in the Windows system code page (CP 1252 on both USA and French Windows), or in UTF-8.
  • In both cases, peek the first 3 bytes. If they're a UTF-8 byte-order mark, bingo, it's UTF-8, and can be used as it is.
  • Else peek a few hundred bytes, and test non-ASCII bytes for UTF-8 compliance. If they are, again assume all the text is in UTF-8.
  • Else assume the text is encoded in the most likely code page determined at first, and transcode it internally to UTF-8 before use.
  • If stdout is the console, then change the stdout file mode to 16-bits, and output UTF-16. (This allows outputing any Unicode character, without having to change the console code page.)
  • If stdout is a pipe, then transcode internally the output to the current console code page.
  • Else stdout is a file. Transcode the output to the Windows system code page, so that all versions of Notepad can read it.
    (Now that Notepad auto-detects UTF-8 on Windows 10, I'm seriously considering to write UTF-8 text files on Windows 10, while keeping the old behavior on Windows <= 8.)
I'm well aware that it's easy to find cases where this heuristic won't work. This is why I added to my tools options for forcing other encodings on stdout. (-O for the OEM encoding (=the default console code page); -A for the ANSI encoding (=Windows system code page); -U for UTF-8). In practice, I never needed -O, very rarely -A, and occasionally -U)

Finally all this is possible for C or C++ programs, but not for batch scripts: I don't think there's a way in Batch to distinguish if stdin and stdout are the console or a pipe or a file.
In other scripting languages like PowerShell or Python, it's probably possible to do it.
In JavaScript I'm not sure, but if it is, I encourage you to do so. This will make life easier for non-English speakers.
dbenham wrote:
02 May 2020 09:07
But why did you substitute FINDSTR for FIND in two places?
I don't understand. Was this a question for me?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#41 Post by dbenham » 03 May 2020 10:04

No, sorry Jean-François. I copied the wrong id. That last question was meant for andresp. I'll try again :oops:

@andresp - I see why you changed to code page 65001, though I think the original code page should be saved before changing, and then restored at the end.

But why did you substitute FINDSTR for FIND in two places? Is that really necessary? I'm thinking my original code used FIND for a reason, though I can't remember what it might have been.


Dave Benham

fazzodude
Posts: 2
Joined: 21 May 2020 15:46

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#42 Post by fazzodude » 21 May 2020 15:59

Hello! Thank you @dbenham for creating this batch file. For some reason I can't get it to execute properly. When I launch it, it seems to be calling cscript.exe but it hangs there. Any ideas what to do about this? Thanks again!

tempsnip.png
tempsnip.png (13.13 KiB) Viewed 16493 times

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#43 Post by dbenham » 22 May 2020 08:57

fazzodude wrote:
21 May 2020 15:59
For some reason I can't get it to execute properly. When I launch it, it seems to be calling cscript.exe but it hangs there. Any ideas what to do about this?
That is the designed behavior - it is waiting for input on stdin. It will continue to wait (seemingly hang) until the stdin is closed. You could enter text at that point and then end stdin by entering <Ctrl-Z><Enter> (it may take a couple times). Then you would get the hash of the text you entered. But that is not typical usage.

A more reasonable approach would be something like ECHO HELLO|HASHSUM (note the hash would include the \r\n at the end of the string)

But normally one or more filenames is specified, as in HASHSUM TEST.TXT or HASHSUM FILE1.TXT FILE2.TXT or HASHSUM *.TXT, etc.

There are many examples of usage in this thread. Especially read the first post in this thread. Also study the help, accessed via HASHSUM /?


Dave Benham

fazzodude
Posts: 2
Joined: 21 May 2020 15:46

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#44 Post by fazzodude » 22 May 2020 10:39

Thanks for the help @dbenham!

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: HASHSUM.BAT v1.6 - emulate md5sum, shasum, and the like

#45 Post by Squashman » 22 May 2020 11:55

fazzodude wrote:
21 May 2020 15:59
Hello! Thank you @dbenham for creating this batch file. For some reason I can't get it to execute properly. When I launch it, it seems to be calling cscript.exe but it hangs there. Any ideas what to do about this? Thanks again!


tempsnip.png
Just out of curiosity, do you know you can copy text from the console?
A much easier way to post results to the forums then creating a screenshot.

Post Reply