"TermStrWidth" utility – measure the number of cells a string occupies in the CLI.
Posted: 06 Aug 2024 16:09
What?
This little tool measures the display width of strings in the Windows Terminal / Console.
It is basically a copy of the recently revised code for this purpose in Microsoft's open source repository. All credits go to their Terminal team.
I used this code in order to actually get the same results as the internally performed measurements. For more information refer to the comments in the C source file which is part of the attached zip archive.
Note that at the time of writing this, the new algorithm is only merged into the Terminal/Console code but not yet part of any official release (apart from recent Canary builds of the Terminal). So, please be patient if not all of your test cases give already the right results. I might be a little premature with this utility.
When?
Use it in UTF-8 encoded scripts if you want to perform some kind of tabulation or centering of text where it is necessary to know upfront how much space a string is going to occupy in the window. As I don't know whether or not new console features are back-ported to Windows 10, I assume only Windows 11 will get full support. Likewise use the Windows Terminal.
How?
Just pass the string to measure as argument to the tool:
A list of widths is written if several strings are passed with the values in a new line each.
Use a FOR /F loop to process the output.
Why?
Most of the people I'm seeing here in the forum, incl. myself, are used to write Latin or Cyrillic text. Due to historical reasons we often still use a single-byte charset and thoughtlessly make the assumption that
1 character
== 1 byte
== 1 code point
== 1 glyph or grapheme cluster
== 1 column for the printed glyph or terminal cluster
However, this is pointless for half of the people in the world. Think about Chinese, Japanese, Korean, Devanagari, or Arabic.
Now that most text editors default to UTF-8 and we are used to include all kind of symbols and emoji in our text, the aforementioned assumption also doesn't make sense for the other half of people any longer.
In *nix world, where UTF-8 is a quasi standard since forever, this has been considered for a long time. C functions wcwidth() and wcswidth() were introduced with the POSIX.1-2001 standard to measure the display width. Although this is far better than what we had on Windows so far, those functions only try to measure the width based on a single code point each, which is clearly insufficient.
The Windows Terminal now contains an algorithm that takes the string context into account. This tool contains the same algorithm and preprocessed data tables. I doubt it's perfect. But due to the lack of any standardization we can't even evaluate how close to perfection it actually is. At least a proposal was already submitted to the UTC ( see https://www.unicode.org/L2/L2023/23107- ... -suppt.pdf ) of how this should be ideally implemented.
Steffen
Test scripts included.
EDIT: I also created a repo on GitHub to provide a C interface for people that are interested. The released tool over there doesn't contain the GCC hack to shrink the binary though.
https://github.com/german-one/wtswidth- ... ring-width
This little tool measures the display width of strings in the Windows Terminal / Console.
It is basically a copy of the recently revised code for this purpose in Microsoft's open source repository. All credits go to their Terminal team.
I used this code in order to actually get the same results as the internally performed measurements. For more information refer to the comments in the C source file which is part of the attached zip archive.
Note that at the time of writing this, the new algorithm is only merged into the Terminal/Console code but not yet part of any official release (apart from recent Canary builds of the Terminal). So, please be patient if not all of your test cases give already the right results. I might be a little premature with this utility.
When?
Use it in UTF-8 encoded scripts if you want to perform some kind of tabulation or centering of text where it is necessary to know upfront how much space a string is going to occupy in the window. As I don't know whether or not new console features are back-ported to Windows 10, I assume only Windows 11 will get full support. Likewise use the Windows Terminal.
How?
Just pass the string to measure as argument to the tool:
Code: Select all
TermStrWidth.exe "Test ✅"'
Use a FOR /F loop to process the output.
Code: Select all
for /f %%i in ('TermStrWidth.exe "Test ✅"') do echo width: %%i
Why?
Most of the people I'm seeing here in the forum, incl. myself, are used to write Latin or Cyrillic text. Due to historical reasons we often still use a single-byte charset and thoughtlessly make the assumption that
1 character
== 1 byte
== 1 code point
== 1 glyph or grapheme cluster
== 1 column for the printed glyph or terminal cluster
However, this is pointless for half of the people in the world. Think about Chinese, Japanese, Korean, Devanagari, or Arabic.
Now that most text editors default to UTF-8 and we are used to include all kind of symbols and emoji in our text, the aforementioned assumption also doesn't make sense for the other half of people any longer.
In *nix world, where UTF-8 is a quasi standard since forever, this has been considered for a long time. C functions wcwidth() and wcswidth() were introduced with the POSIX.1-2001 standard to measure the display width. Although this is far better than what we had on Windows so far, those functions only try to measure the width based on a single code point each, which is clearly insufficient.
The Windows Terminal now contains an algorithm that takes the string context into account. This tool contains the same algorithm and preprocessed data tables. I doubt it's perfect. But due to the lack of any standardization we can't even evaluate how close to perfection it actually is. At least a proposal was already submitted to the UTC ( see https://www.unicode.org/L2/L2023/23107- ... -suppt.pdf ) of how this should be ideally implemented.
Steffen
Test scripts included.
EDIT: I also created a repo on GitHub to provide a C interface for people that are interested. The released tool over there doesn't contain the GCC hack to shrink the binary though.
https://github.com/german-one/wtswidth- ... ring-width