Bibble Integrator, regex samples

From ptWiki

Revision as of 19:42, 5 December 2007 by Normunds (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

By Normunds Kalnbērziņš (Normunds)


The regex method of definition of the filename rules for originals and corresponding versions has a complex syntax and my be initially intimidating. However it is not possible to build a system that will automatically deduct the relationship between originals and versions based on filenames alone, unless we indicate which naming convention we use, so nothing doing, but looking at some regex that might come useful.

The syntax used in Bibble Integrator to define original/derivative naming conventions is VBScript one as I'm using Microsoft component for pattern matching; for syntax see Regular Expression Syntax (Scripting)

Possibly the simplest useful expression would be:

(\w+\.)NEF:\1JPG

It describes an original that has a name that consists of "word" characters "\w" (alphanumeric characters and underscore) then dot "\." then NEF extension. It matches for example DSC_1091.NEF with DSC_1091.JPG. Special signs:

  • \w is any "word" character – alphanumeric characters and underscore - can also be described as "[a-z0-9_]"
  • + (plus) indicates that there is one or more characters "\w", i.e any "word" characters.
  • \. backslash before dot character indicates that it is not a special character, but is a real dot
  • (...) part in brackets is later referenced by \1, in this case defining that JPG has the same name as NEF
  • : is simply a part of our syntax to separate original and version and has no special meaning in regex

The next example is a slightly elaborated version of the first one:

^(\w+\.)NEF:\1(?:JPG|tif|JPEG|PNG)$

This pattern basically defines the same naming convention. In addition it more precisely defines pattern matching boundaries and provides a number of alternatives for filename extensions:

  • ^ indicates that our pattern start with the beginning (of the original name= and $ the end of the string (of derivative). Without them our original pattern would have matched OtherFile-DSC_1091.NEF with DSC_1091.JPG2 as well. And speeds the search up by reducing number of available permutations.
  • (?:JPG|tif|JPEG|PNG) indicates that extension of derivative can be one of the "|" separated list. Our implementation is case insensitive, so this will match jpg as well as TIF extensions. ?: within brackets is optional and slightly speed things up by telling to the parser that the expression in brackets will not be referenced later (as \2)

The next sample is very similar, but defines more precisely the original filename as consisting of DSC_xxxx.NEF where xxxx is a group of 4 digits:

^(DSC_[0-9]{4}\.)NEF:\1(?:JPG|tif|JPEG|PNG)$

  • [0-9] characters in square brackets indicate a character within a certain range.
  • {4} indicates that there are four characters as defined (i.e. four characters in the range of 0-9)

In the following example the character group before _xxxx.NEF is defined as [a-z0-9]*:

^([a-z0-9]*_[0-9]{4}\.)NEF:\1(?:JPG|tif|JPEG|PNG)$

  • * indicates that the preceding character definition is going to be repeated 0 or any number of time. As such this description will describe both DSC_1091.NEF and _1091.NEF (as the alphanumeric character should not be necessary there at all).

And two more examples:

^([a-z0-9]*_[0-9]{1,6}\.)NEF:\1(?:JPG|tif)$

  • [0-9]{1,6} indicates one to six digit group, so that this pattern describes filenames such as DSC_01091.NEF and DSC_91.NEF.

This complex pattern describes an original filename that consists of four underscore separated groups and a dot and extension:

^\w*_([a-z]*_[a-z]*_[0-9]*)\.(?:NEF|CRW):\1(?:-[0-9]*|)\.(?:JPG|JPEG|TIFF|TIF|PNG)$

  • Note that in our sample none of these groups need to have any characters at all (marker *), so that this description will match also a filename ____.NEF. Probably better would be to use "+" instead, so that we have – \w+_([a-z]+_[a-z]+_[0-9]+)\.(?:NEF|CRW) as that is somewhat more likely description of files we might have.
  • (?:-[0-9]*|) notice that this time the "derivative side" contain a new element that says that the derivative name will consist of the part in brackets from original and "-" followed by 0 or more digits. So if we have and original named mod_water_melon_15.crw it will match the derivative water_melon_15-2.png as the first group "mod" is not contained in brackets, hence not referenced by \1 and suffix -2 can be added according to the (?:-[0-9]*|) rule. Notice a bar (|) without a value on the right &ndash in this way we make "-xxx" optional, so that the pattern matches also water_melon_15.png

Another example provided by Ezra in Bibble forum:

^([a-z0-9_-]*)\.NEF:\1(?:\w*)\.(?:JPG|JPEG|TIFF|TIF|PNG|PSD)$

This will match any .NEF originals where filenames consist of alphanumeric characters, underscores and hyphens with derivatives having the same filename, but extended by a suffix containing also alphanumeric characters and underscores.

For example originals:

  • 2007-08-01_water_skiing_0001.NEF
  • 2006_Christmas_001.NEF
  • 2005_Smith_Jones_Wedding_001.NEF
  • DSC_0054.NEF

will match derivatives named like this:

  • 2006_Christmas_001.JPG
  • 2006_Christmas_001_4x6_sRGB.JPG
  • 2006_Christmas_001_4x6_Everett.JPG
Personal tools