Parsing command line arguments using a finite state machine and backtracking
In the previous article, I’ve quickly presented mow.cli, a command line parsing library I created in Go and compared it against docopt and codegangsta/cli.
In this article, I’ll talk about the innards of mow.cli and how it uses finite state machines (FSM) and backtracking to handle some tricky usage strings like the cp test™ for example:
SRC... DST
Introduction
In mow.cli, you can set a spec string on a CLI application or any of its commands and sub commands.
The spec string defines the call syntax, i.e. the options and arguments it accepts and their order.
The spec string syntax is almost the same as docopt’s and POSIX commands.
For example, here’s the cp
command spec (or usage) from its man page:
CP(1) BSD General Commands Manual CP(1)
NAME
cp -- copy files
SYNOPSIS
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ... target_directory
In mow.cli, the second usage can be expressed as the following spec string:
[-R [-H | -L | -P]] [-fi | -n] [-apvX] SRC... DST
Except for some small differences, the syntax reamins mostly the same.
Parsing a spec string
mow.cli uses a tokenizer followed by a recursive descent parser to parse a spec string.
Here’s the simplified EBNF grammar of mow.cli’s spec strings:
spec -> sequence
sequence -> choice*
req_sequence -> choice+
choice -> atom ('|' atom)*
atom -> (shortOpt | longOpt | group | optional) rep?
shortOp -> '-' [A-Za-z]
longOpt -> '--' [A-Za-z][A-Za-z0-9]*
group -> '(' req_sequence ')'
optional -> '[' req_sequence ']'
rep -> '...'
A conventional parser transforms the source into an abstract syntax tree which will then be traversed and acted upon.
For example, a compiler traverses the abstract syntax tree to generate native or byte code.
However, in mow.cli, the process is a bit more inception like: the spec string is parsed not into an abstract syntax tree but into a second parser which will be later used to parse the program call arguments:
An FSM is composed of states and transitions:
- The states are abstract, they do not represent anything in particular, just a position in the command line parsing process.
Reaching a terminal state means that what was parsed so far is valid according to the spec string - The transitions are concrete matchers.
For example, a transition can be triggered when encoutering the an option named-f
or an argument.
The big picture
Before delving into how the FSM is contsructed, here’s a quick example:
Given the following spec string (from the cp
command man page):
[-R [-H | -L | -P]] SRC... DST
The corresponding FSM is:
This may seem intimidating at first, but it really isn’t that hard.
Simply start from the entry point S1
and follow the transitions until the exit point S47
is reached.
some possible routes:
S1 -> S9 -> S31 -> S41 -> S47
: matches the callcp -R -H SRC DST
S1 -> S26 -> S41 -> S41 -> S47
: matches the callcp -R -H SRC SRC DST
- etc.
In the following, I’ll explain how the previous FSM was constructed from the spec string [-R [-H | -L | -P]] SRC... DST
.
The spec parser
As described in another article of mine, transforming an EBNF grammar into a recursive descent parser is almost a mechanical task.
And so, unsurprisingly, for every non-terminal in the spec grammar there is a similarly named method which will parse it.
For example, the spec parser defines a choice
method for the choice
production rule:
func (p *uParser) choice() (*state, *state) {
...
}
A state is a go struct representing a state in the FSM and contains a list of outgoing transitions to other states.
It also contains a boolean flag to indicate if it is terminal or not.
As can be seen in the method signature above, all the parsing methods return a pair of states representing the entry and exit points of the partial FSM.
It is possible to only return the entry point of the FSM.
The exit points can then be retrieved by traversing the transitions until reaching the terminal states (there could be multiple exit points).
However, I decided to always return exactly one entry point and one exit point to make my life easier (as would be seen below).
Here’s how mow.cli transforms the different spec string components into FSMs.
Options
Spec:
-f
FSM:
Easy: one state with a single transition with the option to a terminal state.
Arguments
Spec:
SRC
FSM:
Same as before, the execution starts in the S1
state.
The only possible transition is when an argument (anything which doesn’t start with a dash) is encountered.
Optionality
Rendering an FSM component optional is a simple matter of creating a shortcut from its start state to its end state.
For example, starting from a 2 positional arguments FSM:
It is a simple matter of creating a shortcut transition from S1
(start) to S3
(end):
A shortcut transition is marked with the *
symbol and can always be followed without consuming any call arguments.
This way, starting from S1
, the FSM can either match an SRC
and a DST
arguments, or it can directly jump to the exit point S3
.
Repetition
To handle repetition, a shortcut transition *
is created from the end state back to the start state.
For example, starting from the FSM for the spec string X Y
:
The FSM for (X Y)...
becomes:
This way, after matching a X
and a Y
arguments and reaching the exit point S3
, the FSM can go back to the entry point S1
to match more arguments.
Choice
We start from n possible alternatives. Each alternative is a partial FSM. The FSM of a choice is then constructed by:
- Create a pair of start and end states
S
andE
- Connect
S
to every partial FSM’s start state using a shortcut*
transition - Connect every partial FSM’s end state to the
E
state using a shortcut*
transition
For example, starting from the FSM for the spec strings -x
and -y
:
The FSM for (X Y)...
becomes:
Here’s an interactive animation to demonstrate the process at work:
Press the play button to start the animation
Press pause to pause it
When paused, you can advance or go back by a frame using the advance and back buttons
You can scroll and zoom in the animation area
As a side note, to create this interactive animation, I used the dagre-d3 JS library which implements graphviz-like graph layouting algorithms and uses D3 for the rendering, plus some hand written Javascript code for the interactive part.
Sequence
The last piece is handling sequences, e.g.:
-f SRC DST
We start from n consecutive components (-f
, SRC
and DST
in the example above).
As always, every component of the sequence is partial FSM with a start and an end state.
One to construct the FSM of a sequence is to connect the end state of every partial FSM to the start state of the following FSM.
Here’s how that process would look like:
This method, while it works for the example above, introduces an annoying side-effect: options order.
Because it connects the partial FSMs in the order in which they appear in the spec string, the generated FSM will only accept the options in that same order.
For example, given a hypothetical command cmd
with the following spec string:
-f [-g] FILE
The resulting FSM, will only accept the following invocations:
cmd -f README.md
cmd -fg README.md
But it will reject cmd -g -f README.md
for example, which is annoying for the command users.
It only gets worse when the number of options grows, like in cp
for example:
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] SRC... DST
To solve this, mow.cli does not simply (no pun intended) connect the partial FSMs in a linear fashion.
Instead, the following logic is applied:
Given 2 components FSMs A
and B
, each composed of a start and an end state (A.start, A.end)
and (B.start, B.end)
:
- if
A
orB
contains a positional argument, connect them using the process described above as the order of positional arguments is important - else, i.e.
A
andB
contain only options (and maybe shortcuts), construct an FSM which accepts bothA
followed byB
andB
followed byA
:- Create a copy
A'
ofA
andB'
ofB
- Create 2 new states
S
andE
which will become the start and end states of the generated FSM - Connect
S
toA.start
,A.end
toB.start
andB.end
toE
using shortcuts:S->A->B->E
. This will acceptA
thenB
- Connect
S
toB'.start
,B'.end
toA'.start
andA'.end
toE
using shortcuts:S->B'->A'->E
. This will acceptB
thenA
- Create a copy
And since it took me a couple of hours to create this FSM animation thingy, here’s another interactive animation which demonstrates the process described above in action:
The process described above works on 2 components.
Generalizing it to work on more is trivial: apply it to the first 2 components A
and B
which results in A+B
, and then apply the same process on A+B
and the third component C
, etc.
Here’s a more complex example involving 3 options, a choice and an argument:
-a [-b | -c] FILE
The final FSM may seem daunting, but it can get even more scarier with more complex spec strings.
However, the generation logic, as explained above, is composed of very simple generation rules which can be then be composed to produce such monstrosities.
And the complexity of the generated FSM is there for a reason: except for a rules engine1, I don’t think that there is another way to correctly validate a program call arguments according to a spec string (be sure to check the previous article for some cases where docopt either fails to accept valid cases or to reject invalid ones).
The call arguments parser
Once a spec string is transformed into a FSM, the latter can be used to parse the program call arguments using the following algorithm:
The inputs are the current state and the call arguments.
Initially, the current state is the FSM start state and the call arguments are the list of arguments passed by the user to the program:
- If the call arguments list is empty and the current state is terminal, the parsing succeeds, else if fails
- Else, i.e. the call arguments list is not empty: List all the possible transitions from the current state:
- if a transition is a shortcut, it is always possible to follow
- if a transition is an option
-x
, it can only be followed if the first call argument is an option with the same name - if a transition is an argument, it can only be followed if the first argument is a string not starting with a dash
-
(i.e. not an option)
- For every possible transition:
- consume the matching call arguments:
- if the transition is a shortcut, nothing is consumed
- if the transition is an option, consume the first one or two arguments depending on the call syntax and the option type: consume one argument for boolean options or in case of a
=
(e.g.-s=42
), and two arguments otherwise (e.g.-s 42
) - if the transition is an argument, consume exactly one call argument
- set the current state to the target state of the transition
- go back to step 1.
- if following this transition (the call return
true
), also returntrue
- consume the matching call arguments:
- If there are no possible transitions, or all of them failed, fail too
Here’s the same algorithm in Go (simplified for readability)
func parse(currentState state, args []string) bool {
if len(args)==0 {
// no more call args and current is terminal, succeed
return state.terminal
}
for _, transition := range currentState.transitions {
/* transition.matches returns a boolean to indicate
* if it matched or not and the number of consumed
* arguments (0, 1 or 2)
*/
if ok, consumed := transition.matches(args); ok {
/* recursively call parse again with the transition
* target as the current state and with the call
* args minus what the transition consumed
*/
if parse(transition.next, args[consumed:]) {
// this transition succeeded, succeed too !
return true
}
}
}
// none of the transitions matched, fail
return false
}
Now for an example:
Successful parse
Spec string:
[-f|-g] FILE
FSM:
Execution with the argument list -f X
:
State | Args | Comment |
---|---|---|
S | -f X |
args is not empty, so check all the possible transitions |
there are two possible transitions to F1 and G1 |
||
try the first one leading to F1 |
||
F1 | -f X |
args is not empty, so check all the possible transitions |
there is one possible transition to F2 and it consumes one argument |
||
F2 | X |
there is one possible transition to A and it doesn’t consume anything |
A | X |
there is one possible transition to E and it consumes one argument |
E | ø | args is empty and current state is terminal, success ! |
Failed parse
Same spec string but with the argument list -f -g X
:
State | Args | Comment |
---|---|---|
S | -f -g X |
args is not empty, so check all the possible transitions |
there are two possible transitions to F1 and G1 |
||
try the first one leading to F1 |
||
F1 | -f -g X |
args is not empty, so check all the possible transitions |
there is one possible transition to F2 and it consumes one argument |
||
F2 | -g X |
there is one possible transition to A and it doesn’t consume anything |
A | -g X |
there are no possible transitions, fail |
F2 | -g X |
no more transitions to try, fail |
F1 | -g X |
no more transitions to try, fail |
S | -f -g X |
try the second transition to G1 |
G1 | -f -g X |
there are no possible transitions, fail |
S | -f -g X |
no more transitions to try, fail |
Simplification
We’re not done yet.
The FSMs generated using the rules described above, while semantically correct, can cause infinite loops due to the shortcut transitions *
.
For example, given this spec string:
[-e]...
i.e. a repeatable options string, like the -e
flag in the docker run
command for example.
mow.cli would generate the following FSM for such a spec:
The -e
option gets a start and an end states A
and B
with the option name as a transition.
Because it is optional, a shortcut is added from A
to B
.
Finally, because it is repeatable, another shortcut from B
to A
is also added.
In some situations, the algorithm described above will run forever (or until the call stack explodes) bouncing back and forth taking the shortcut transitions between A
and B
.
The shortcuts are very handy during the spec parsing phase to easily construct semantically correct FSM, so we’d rather keep them during that phase.
The solution is to let the spec parsing phase generate shortcuts but to then apply a transformation on the FSM to get rid of them.
Here’s how this is done in mow.cli:
For every state S
:
- Bail out if
S
was already visited - For every transition, recursively apply the algorithm on the next state
- While
S
has a shortcut transitiontr
leading to a stateT
:- Add all of
T
’s transitions toS
(without adding duplicates) - If
T
is terminal, markS
as also terminal - remove the transition
tr
fromS
’s transitions
- Add all of
And here’s how this algorithm would behave when applied to the FSM of the spec string [-e]...
(which causes an infinite loop as shown above):
St | Step | Remark | FSM |
---|---|---|---|
A | 1 | Initial state | |
2 | A has a transition t1 to B , simplify B |
||
B | 1 | B has not already been visited |
|
2 | B has a transition t3 to A , but A was already visited, NOP |
||
3 | B has one shortcut t3 to A |
||
3.1 | add all of A ’s transitions (t1 and t2 ) to B |
||
3.2 | A is not terminal, NOP |
||
3.3 | remove t3 |
||
3 | B has one shortcut t1' to B |
||
3.1 | Nothing to add as B already has all of the target transitions |
||
3.2 | B is already terminal |
||
3.3 | remove t1' |
||
3. | B has no more shortcuts |
||
A | 3. | A has one shortcut t1 to B |
|
3.1 | add all of B ’s transitions to A |
||
3.2 | B is terminal, mark A as also terminal |
||
3.3 | remove t1 |
||
3 | A has no more shortcuts |
We went from:
to:
Both will accept and reject exactly the same set of call args, but the second form doesn’t contain any shortcuts and so is more suitable for parsing as it can’t cause infinite loops.
Backtracking
The backtracking is already implemented in the algorithm above because all the possible transitions will be tried and not only the first that matches.
Here’s how that would work for the cp test™, i.e. with the following spec SRC... DST
and FSM:
State | Args | Comment |
---|---|---|
S1 | A B |
args is not empty, so check all the possible transitions |
there is one possible transition to S2 and it consumes one argument |
||
S2 | B |
args is not empty, so check all the possible transitions |
there are 2 possible transitions to S2 and to S3 |
||
try the first one which loops back to S2 |
||
S2 | ø | args is empty but S2 is not terminal, fail |
S2 | B |
try the second transition leading to S3 |
S3 | ø | args is empty and S3 is terminal, success |
This simple algorithm scales to arbitrarily complex cases.
Fin
To summarize:
- mow.cli uses a recursive descent parser to transform a spec string into an FSM
- the FSM is then simplified to get rid of the shortcuts because they may lead to infinite loops
- the resulting FSM is then used to validate the program call arguments using a backtracking algorithm
This only covers the high-level concepts and does not cover everything mow.cli does, like collecting, converting and storing the various options and arguments values for the user to retrieve them later.
-
I’m sure somebody in the internet will prove me wrong on this point: I’d love you to ! ↩︎