BizTalk Flat File schema optional attribute issue

I encountered this interesting issue and thanks to Colin we were able to resolve it. There will be situations you will encounter when adding additional optional attributes to a Flat File (FF) schema in BizTalk will cause problems. To get around this you basically will need to set the following properties to relax the parsing of the attributes which break.

1
2
3

- parser_optimization="complexity"
- allow_early_termination="true"
- early_terminate_optional_fields="true"

This got me thinking more and wanting to understand what does changing these attributes mean under the covers. Below is what I found out on each of these.

On parser_optimization:

Setting the parser_optimization to complex essentially generates a more complicated grammar (it uses both a top down and bottom up parsing); this grammar is then used to parse the FF.
The complicated grammar is better when parsing records with more optional nested options – however it still cannot handle all the layout conditions and can still break in some situations.
And given the runtime is doing more things, this will be slower than the other option called ‘speed’ (yeah no kidding Sherlock!).
- The reason the ‘speed’ option is faster is because it uses top-down parsing only.
In addition you should also set lookahead_depth to zero (more on this below) to avoid validation failures (against a schema) when there are many optional nodes in the same group/record.

Changing the lookahead_depth itself is trivial but you need to be a little more aware of what this means:

This essentially tells the parser when making a parsing prediction how far ahead to look in the token stream.
Setting this to Zero essentially means ‘infinite lookahead’ which in turn means more memory will be consumed.
Depending on how busy your BizTalk servers are and how much memory pressure you already experience processing various files (and their sizes), this might be an issue.

Basically, the FF parser is a streaming parser and implemented as a leftmost derivation which takes in a CFG . Essentially when we change the lookahead_depth to zero we change do not restrict this and the parser can recognize tokens using DFA perhaps (of course we don’t know the real implementation).

For those old school like me, and have played with yacc – that is a LL(1) parser – essentially parse the grammar with one token lookahead.

On allow_early_termination=“true”:

When working with FF’s BizTalk expects that every line is of the same length (either because of the data contained padded with spaces). However if it finds a newline (CR + LF) character then it breaks and you get an error something along the lines of “Unexpected data found while looking for: \r\n”.
Adding the allow_early_termination setting helps fix this. Read more here .
Also note that only the right-most positional field is allowed to early terminate.

Lastly, the early_terminate_optional_fields attribute enables early termination of optional trailing fields. A couple of points to note on this:

If your schema does not have this annotation and you open that in the BizTalk editor, then it will automatically add this annotation explicitly and set it to the default value of False.
This only takes affect if you also have the allow_early_termination annotation set to True.
More details on this here .

And in case you were wondering this is a supported option by Microsoft as shown in this KB article .