Just for the record, here is a a tentative answer to the title question... The following demonstrates a way to convert hardcoded or read-from-file UTF-8 strings to UTF-16 and store them in a regular, usable variable. On one hand, the code is not pretty and the conversion is painfully slow. On the other hand, it does actually work (tried under xp.sp3 and win7.sp1), and only uses reg.exe and wmic.exe which are builtins as of xp+. As far as I can tell, it's not been attempted this way before. Maybe this inspires someone to come up with a neater, faster, pure-batch solution.
Basic idea was fairly straightforward:
1. Get the string somehow merged into the registry under HKCU\Environment.
2. Pick up the newly registered environment variable from the registry, use it happily ever after
Difficulties along the way:
1.a. The UTF-8 string can be saved as either UTF-8 or UTF-16LE to an external file using known tricks, previously discussed. But the natural choice for registry manipulation "setx.exe -f" doesn't seem to do a proper codepage translation from UTF-8, nor take a UTF-16LE input file. Workaround was to manually build a UTF-16LE .reg file, then use reg.exe to merge it into the registry.
2.a. Once the new variable is added to the registry, Windows needs to be notified before it acknowledges it (
http://support.microsoft.com/kb/104011 - How to propagate environment variables to the system). The batch code itself cannot send the expected WM_SETTINGCHANGE message. One would hope that setx.exe did that after effecting changes, but it doesn't appear to. Turns out that wmic.exe does it after environment changes, however.
2.b. Even once Windows is notified, the environment changes are only visible to future processes, since each current one maintains its own copy, initialized at the time it was started. So a new process is needed to pick up the changes. Unfortunately, any 'cmd' launched from the active console runs as a child process, and inherits the environment of its parent (either current, or original for cmd/i) i.e. is oblivious to system level environment changes. One way to start a new 'cmd' process not-as-a-child is to use 'wmic process call create'.
2.c. Once the new process is started, and sees the just-added environment variable, issue remains that it has no direct way to return it to the caller. Workaround here is to create a temporary file with the given name, whose name can then be read back in the original batch. Since wmic starts the secondary 'cmd' asynchronously, the caller needs to wait until the callee completes.
That said, the sample set-utf8.cmd code is copied below.
Code: Select all
:: set-utf8.cmd - convert utf-8 to utf-16 and store in an(other) variable
::
:: syntax: set-utf8 [out,ref] string-var, [in,ref] utf-8-string-var
::
:: - expected to fail on 'poison' (&%!) and illegal <:"\/|> path characters
:: which is fixable, but not relevant to the main point of this exercise
::
:: - otherwise checked ok under xp.sp3, win7.sp1.x64
@echo off & setLocal enableExtensions disableDelayedExpansion
if "%~2"=="" ( echo.
@rem dump :: comment lines at the top of the file
for /f "usebackq delims=" %%a in ("%~f0") do (
set "z=%%~a" & setlocal enableDelayedExpansion
if not "!z:~0,1!"==":" endlocal & goto :eof
echo !z! & endlocal
)
endLocal & goto :eof
)
@rem save original codepage ('.' for some localized windows e.g. german)
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"
@rem set global variables
set "hkcu.env=HKEY_CURRENT_USER\Environment"
@rem utf-16le bom, hex 'FF FE' n.b. win7 requires chcp 1252, first
chcp 1252 >nul
set "bom16le=ÿþ"
chcp %cp% >nul
call :set.utf u16 "%~2"
endLocal & set "%~1=%u16%" & goto :eof
:set.utf
setLocal enableDelayedExpansion
set "var==%time::=.%.%random%"
set "tmp8=%temp%\%var%.tmp"
set "reg16=%temp%\%var%.reg"
:: build utf-16le .reg file including bom n.b. win7 requires chcp 1252, first
chcp 1252 >nul
cmd /d /a /c (set/p "=%bom16le%") <nul >"%reg16%" 2>nul
chcp %cp% >nul
@rem save fixed header
cmd /d /u /c ^
(echo Windows Registry Editor Version 5.00) ^& ^
(echo.) ^& ^
(echo [%hkcu.env%]) >>"%reg16%"
@rem save variable, separate echo> + dir/u type>> required for utf-8 conversion
echo "%var%"="!%~2!" >"%tmp8%"
chcp 65001>nul & cmd /u /c type "%tmp8%" >>"%reg16%" & chcp %cp%>nul
del "%tmp8%"
:: set variable in user's environment
@rem n.b. win7 sends 'operation completed successfully' to &2, therefore 2>&1
reg import "%reg16%" >nul 2>&1
:: force an environment refresh for the next cmd to pick up the new variable
@rem create another dummy variable since under xp at least
@rem - setx doesn't broadcast the necessary wm_settingchange, and anyway
@rem it only comes with the resource kit, not in the default install
@rem - wmic 'environment create' does broadcast the wm_settingchange, but
@rem sometimes hangs at exit waiting for input, therefore the <nul
wmic environment create name="%var% ",variablevalue=" ",username="%username%" <nul >nul 2>&1
:: run an external (not child) cmd to create a temp file with the utf-16 name
md "%temp%\!var!"
wmic process call create '%comspec% /v /c copy nul "%temp%\!var!\^!%var%^!.tmp"' <nul >nul 2>&1
:: wait until the external cmd completes
set "u16="
:loop
for %%u in ("%temp%\!var!\*.tmp") do set "u16=%%~nu"
if not defined u16 goto :loop
:: cleanup
rd /s /q "%temp%\!var!"
reg delete "%hkcu.env%" /v "!var!" /f >nul 2>&1
@rem this removes the other dummy variable, also forces an environment refresh
wmic environment where(name="!var! ") delete <nul >nul 2>&1
del "%reg16%"
endLocal & set "%~1=%u16%" & goto :eof
Test case using the set-utf8-test.cmd copied below, and assuming the same utf8.txt file from the previous post
Code: Select all
@echo off & setLocal disableDelayedExpansion & echo.
:: example of reading utf-8 from external file
@rem binary contents of 'utf8.txt' must be
@rem E2 80 B9 CE B1 C3 9F C2 A9 E2 88 82 E2 82 AC E2 80 BA 0D 0A
for /f %%s in (utf8.txt) do set "ucs2.utf8=%%s"
call set-utf8 "ucs2" "ucs2.utf8"
setLocal enableDelayedExpansion
echo "!ucs2.utf8!" [utf-8] = "!ucs2!" [utf-16]
endLocal
:: example of hardcoding utf-8 in batch itself
@rem binary contents of string below in the .cmd file must be
@rem E2 80 B9 CE B1 C3 9F C2 A9 E2 88 82 E2 82 AC E2 80 BA
set "ucs2.utf8=‹αß©∂€›"
call set-utf8 "ucs2" "ucs2.utf8"
setLocal enableDelayedExpansion
echo "!ucs2.utf8!" [utf-8] = "!ucs2!" [utf-16]
endLocal
endLocal & goto :eof
outputs
Code: Select all
C:\tmp>set-utf8-test
"ΓÇ╣╬▒├ƒ┬⌐ΓêéΓé¼ΓÇ║" [utf-8] = "‹αß©∂€›" [utf-16]
"ΓÇ╣╬▒├ƒ┬⌐ΓêéΓé¼ΓÇ║" [utf-8] = "‹αß©∂€›" [utf-16]
C:\tmp>
Liviu