How does the Windows RENAME command interpret wildcards?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

How does the Windows RENAME command interpret wildcards?

#1 Post by dbenham » 16 Sep 2012 15:02

I originally posted this question and answer on StackExchange

How does the Windows RENAME (REN) command interpret wildcards?

The built in HELP facility is of no help - it doesn't address wildcards at all.

The Microsoft technet XP online help isn't much better. Here is all it has to say regarding wildcards:

"You can use wildcards (* and ?) in either file name parameter. If you use wildcards in filename2, the characters represented by the wildcards will be identical to the corresponding characters in filename1."

Not much help - there are many ways that statement can be interpretted.

I've managed to successfully use wildcards in the filename2 parameter on some occasions, but it has always been trial and error. I haven't been able to anticipate what works and what doesn't. Frequently I've had to resort to writing a small batch script with a FOR loop that parses each name so that I can build each new name as needed. Not very convenient.

If I knew the rules for how wildcards are processed then I figure I could use the RENAME command more effectively without having to resort to batch as often. Of course knowing the rules would also benefit batch development.

(Yes - this is a case where I am posting a paired question and answer. I got tired of not knowing the rules and decided to experiment on my own. I figure many others may be interested in what I discovered)


==========================================================================


These rules were discovered after extensive testing on a Vista machine. No tests were done with unicode in file names.

RENAME requires 2 parameters - a sourceMask, followed by a targetMask. Both the sourceMask and targetMask can contain * and/or ? wildcards. The behavior of the wildcards changes slightly between source and target masks.


sourceMask

The sourceMask works as a filter to determine which files are renamed. The wildcards work here the same as with any other command that filters file names.

    ? - Matches any 0 or 1 character except . This wildcard is greedy - it always consumes the next character if it is not a . However it will match nothing without failure if at name end or if the next character is a .


    * - Matches any 0 or more characters including . (with one exception below). This wildcard is not greedy. It will match as little or as much as is needed to enable subsequent characters to match.


All non-wildcard characters must match themselves, with a few special case exceptions.


    . - Matches itself or it can match the end of name (nothing) if no more characters remain. (Note - a valid Windows name cannot end with .)


    {space} - Matches itself or it can match the end of name (nothing) if no more characters remain. (Note - a valid Windows name cannot end with {space})


    *. at the end - Matches any 0 or more characters except . The terminating . can actually be any combination of . and {space} as long as the very last character in the mask is . This is the one and only exception where * does not simply match any set of characters.

The above rules are not that complex. But there is one more very important rule that makes the situation confusing: The sourceMask is compared against both the long name and the short 8.3 name (if it exists). This last rule can make interpretation of the results very tricky, because it is not always obvious when the mask is matching via the short name.

It is possible to use RegEdit to disable the generation of short 8.3 names on NTFS volumes, at which point interpretation of file mask results is much more straight forward. Any short names that were generated before disabling short names will remain.


targetMask

The targetMask specifies the new name. It is always applied to the full long name; The targetMask is never applied to the short 8.3 name, even if the sourceMask matched the short 8.3 name.

The presence or absence of wildcards in the sourceMask has no impact on how wildcards are processed in the targetMask.

In the following discussion - c represents any character that is not *, ?, or .

The targetMask is processed against the source name strictly from left to right with no back-tracking.

    c - Advances the position within the source name as long as the next character is not . and appends c to the target name. (Replaces the character that was in source with c, but never replaces .)


    ? - Matches the next character from the source long name and appends it to the target name as long as the next character is not . If the next character is . or if at the end of the source name then no character is added to the result and the current position within the source name is unchanged.


    * at end of sourceMask - Appends all remaining characters from source to the target. If already at the end of source, then does nothing.


    *c - Matches all source characters from current position through the last occurance of c (case sensitive greedy match) and appends the matched set of characters to the target name. If c is not found, then all remaining characters from source are appended, followed by c This is the only situation I am aware of where Windows file pattern matching is case sensitive.


    *. - Matches all source characters from current position through the last occurance of . (greedy match) and appends the matched set of characters to the target name. If . is not found, then all remaining characters from source are appended, followed by .


    *? - Appends all remaining characters from source to the target. Any additional characters after the *? in sourceMask will be appended to target. If already at end of source then does nothing.


    . without * in front - Advances the position in source through the first occurance of . without copying any characters, and appends . to the target name. If . is not found in the source, then advances to the end of source and appends . to the target name.

After the targetMask has been exhausted, any trailing . and {space} are trimmed off the end of the resulting target name because Windows file names cannot end with . or {space}


Some practical examples

Substitute a character in the 1st and 3rd positions prior to any extension (adds a 2nd or 3rd character if it doesn't exist yet)

Code: Select all

ren  *  A?Z*
  1        -> AZ
  12       -> A2Z
  1.txt    -> AZ.txt
  12.txt   -> A2Z.txt
  123      -> A2Z
  123.txt  -> A2Z.txt
  1234     -> A2Z4
  1234.txt -> A2Z4.txt

Change the (final) extension of every file

Code: Select all

ren  *  *.txt
  a     -> a.txt
  b.dat -> b.txt
  c.x.y -> c.x.txt

Append an extension to every file

Code: Select all

ren  *  *?.bak
  a     -> a.bak
  b.dat -> b.dat.bak
  c.x.y -> c.x.y.bak

Remove any extra extension after the initial extension. Note that adequate ? must be used to preserve the full existing name and initial extension.

Code: Select all

ren  *  ?????.?????
  a     -> a
  a.b   -> a.b
  a.b.c -> a.b
  part1.part2.part3    -> part1.part2
  123456.123456.123456 -> 12345.12345   (note truncated name and extension because not enough `?` were used)

Same as above, but filter out files with initial name and/or extension longer than 5 chars so that they are not truncated. (Obviously could add an additional ? on either end of targetMask to preserve names and extensions up to 6 chars long)

Code: Select all

ren  ?????.?????.*  ?????.?????
  a      ->  a
  a.b    ->  a.b
  a.b.c  ->  a.b
  part1.part2.part3  ->  part1.part2
  123456.123456.123456  (Not renamed because doesn't match sourceMask)

Change characters after last _ in name and attempt to preserve extension. (Doesn't work properly if _ appears in extension)

Code: Select all

ren  *_*  *_NEW.*
  abcd_12345.txt  ->  abcd_NEW.txt
  abc_newt_1.dat  ->  abc_newt_NEW.txt
  abcdef.jpg          (Not renamed because doesn't match sourceMask)
  abcd_123.a_b    ->  abcd_123.a_NEW  (not desired, but no simple RENAME form will work in this case)

Any name can be broken up into components that are delimited by . Characters may only be appended to or deleted from the end of each component. Characters cannot be deleted from or added to the beginning or middle of a component while preserving the remainder with wildcards. Substitutions are allowed anywhere.

Code: Select all

ren  ????????????.????????????.????????????  ?x.????999.*rForTheCourse
  part1.part2              (Not renamed, doesn't match sourceMask)
  part1.part2.part3    ->  px.part999.parForTheCourse
  part1.part2.part3.part4  (Not renamed, doesn't match sourceMask)
  a.b.c                ->  ax.b999.crForTheCourse
  a.b.CarPart3BEER     ->  ax.b999.CarParForTheCourse


Possible RENAME bug - a single command may rename the same file twice!

Starting in an empty test folder:

Code: Select all

C:\test>copy nul 123456789.123
        1 file(s) copied.

C:\test>dir /x
 Volume in drive C is OS
 Volume Serial Number is EE2C-5A11

 Directory of C:\test

09/15/2012  07:42 PM    <DIR>                       .
09/15/2012  07:42 PM    <DIR>                       ..
09/15/2012  07:42 PM                 0 123456~1.123 123456789.123
               1 File(s)              0 bytes
               2 Dir(s)  327,237,562,368 bytes free

C:\test>ren *1* 2*3.?x

C:\test>dir /x
 Volume in drive C is OS
 Volume Serial Number is EE2C-5A11

 Directory of C:\test

09/15/2012  07:42 PM    <DIR>                       .
09/15/2012  07:42 PM    <DIR>                       ..
09/15/2012  07:42 PM                 0 223456~1.XX  223456789.123.xx
               1 File(s)              0 bytes
               2 Dir(s)  327,237,562,368 bytes free

REM Expected result = 223456789.123.x

I believe the sourceMask *1* first matches the long file name, and the file is renamed to the expected result of 223456789.123.x. RENAME then continues to look for more files to process and finds the newly named file via the new short name of 223456~1.X. The file is then renamed again giving the final result of 223456789.123.xx.

If I disable 8.3 name generation then the RENAME gives the expected result.

I haven't fully worked out all of the trigger conditions that must exist to induce this odd behavior. I was concerned that it might be possible to create a never ending recursive RENAME, but I was never able to induce one.

I believe all of the following must be true to induce the bug. Every bugged case I saw had the following conditions, but not all cases that met the following conditions were bugged.

  • Short 8.3 names must be enabled
  • The sourceMask must match the original long name.
  • The initial rename must generate a short name that also matches the sourceMask
  • The initial renamed short name must sort later than the original short name (if it existed?)


Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: How does the Windows RENAME command interpret wildcards?

#2 Post by Liviu » 16 Sep 2012 17:31

Interesting reading, thanks for sharing.

One distinction worth noting is that wildcards in the "sourceMask" are pretty much standard across Windows apps, since most end up using the system functions FindFirstFile/FindNextFile with their builtin rules. In contrast, wildcards in the "targetMask" are not a system standard and must be implemented independently by each app attempting to support them (since there is no multiple copy/move API, and the shell level SHFileOperation explicitly forbids wildcards in the destination).

It would be natural to expect that there'd be some consistency in handling target wildcards at least between cmd's own commands, and in fact that appears to be the case for the internal commands copy and rename. However, the external command comp.exe doesn't follow the same rules.

dbenham wrote:Change the (final) extension of every file
[...] Remove any extra extension after the initial extension.

Just as a matter of terminology, DOS/Windows filenames have exactly one "extension" (possibly empty). Dots other than the last one are part of the "name". There is no notion of multiple or cascaded extensions, for example "abc.tar.gz" has a name of "abc.tar" and an extension of ".gz". It's not possible to associate an action with ".tar.gz" files (which is why in win-world they are often times renamed to ".tgz"). It's therefore a bit confusing to refer to the "last" (or "initial") extension.

dbenham wrote:Remove any extra extension after the initial extension.

The more common way to do that is "ren *.* *.".

dbenham wrote:Possible RENAME bug - a single command may rename the same file twice!

Interesting... Confirmed under XP 32b.

Liviu

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How does the Windows RENAME command interpret wildcards?

#3 Post by dbenham » 16 Sep 2012 17:49

Liviu wrote:Just as a matter of terminology, DOS/Windows filenames have exactly one "extension" (possibly empty). Dots other than the last one are part of the "name". There is no notion of multiple or cascaded extensions...

Yes, I know I was a bit loose with my terminology. I was looking for a concise way to refer to the part of a file name that lies between two dots. Hopefully people understand what I meant.


Liviu wrote:
dbenham wrote:Remove any extra extension after the initial extension.

The more common way to do that is "ren *.* *.".

Actually that gives a different result. Your command only removes the true extension (everything after the last dot).

Code: Select all

ren  a.b.c.d  *.
  -->  a.b.c

My command removes everything after the 2nd dot.

Code: Select all

ren  a.b.c.d  ??????.???????
  -->  a.b
In reality I would probably use a lot more ? so that I could preserve fairly large names and/or extensions (the chars between the 1st and 2nd dots becomes the true extension after the rename is complete)


Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: How does the Windows RENAME command interpret wildcards?

#4 Post by Liviu » 16 Sep 2012 18:58

dbenham wrote:Actually that gives a different result. Your command only removes the true extension (everything after the last dot).

You're right, and I must have missed the "a.b -> a.b" part of your example. Consider that an (involuntary) confirmation of my confusion with the terminology ;-)

I may have had another question about "sourceMask [...] ? [...] will match nothing without failure if at name end or if the next character is a ." but I might be misunderstanding that as well, since I am not sure whether "name" refers to a full filename (name + extension), just the "name" part of it (up to the last dot), or perhaps the string up to the first dot.

Liviu

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How does the Windows RENAME command interpret wildcards?

#5 Post by dbenham » 16 Sep 2012 19:51

My "name" was referring to the entire file name (name + extension).

I believe the ? wildcard is equivalent to the following regular expression [^.]?


Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: How does the Windows RENAME command interpret wildcards?

#6 Post by Liviu » 16 Sep 2012 20:58

dbenham wrote:My "name" was referring to the entire file name (name + extension).

Thanks. The "will match nothing without failure" part was a bit ambiguous. Guess you meant...

Code: Select all

C:\tmp\ren>dir /a-d /b
a.b.c.123
a.b.c.d

C:\tmp\ren>dir /a-d /b ?.?.?.?.?.?.*.?.*.?
a.b.c.d

I'd have rather phrased it as... well, don't have any good idea... Perhaps "trailing .*? past the end of a fully matched specification are ignored".

Liviu

Post Reply