Archive for July, 2015

Python 2 vs Python 3: AST differences

Python 3 is 6 years old at the moment of this writing. However, Python 2 is still widely used. For this reason, I try to simultaneously support Python 2 and 3 on noWorkflow.

Since noWorkflow uses AST to extract part of the captured provenance, I decided to create this post to talk about differences that I found on the ast module.

Of course, many differences are related to the grammar changes and are well documented on https://docs.python.org/3.0/whatsnew/3.0.html. However, some changes, such as [4] and [6] were unexpected to me, since they did not follow any grammar change. So, I decided to create this post to talk about them.

I performed my comparisons mostly on versions 2.7.6 and 3.4.0. I am not sure if anything has changed in newer releases.

AST Definition

The most basic comparison that can be done is the diff of the ast definition. The definition can be found on the official Python docs website: ast 2, ast 3.

 

{
         [1]
    stmt = FunctionDef(identifier name, arguments args, 
                           stmt* body, expr* decorator_list,
                           expr? returns)
         [2]
         | ClassDef(identifier name, expr* bases, 
                    keyword* keywords, expr? starargs, expr? kwargs
                    stmt* body, expr* decorator_list)
         [3]
 	 | Print(expr? dest, expr* values, bool nl)
         [4]
         | With(expr context_expr, expr? optional_vars, 
                withitem* items, stmt* body)
         [5]
         | Raise(expr? type, expr? inst, expr? tback)
         | Raise(expr? exc, expr? cause)
         [6]
         | TryExcept(stmt* body, excepthandler* handlers,
                          stmt* orelse)
         | TryFinally(stmt* body, stmt* finalbody)
         | Try(stmt* body, excepthandler* handlers, stmt* orelse,
               stmt* finalbody)
         [7]
         | Exec(expr body, expr? globals, expr? locals)
         [8]
         | Nonlocal(identifier* names)
         | ...

    expr =
         [9]
 	 | YieldFrom(expr value)
         [10]
 	 | Repr(expr value)
         [11]
         | Bytes(bytes s)
         [12]
         | NameConstant(singleton value)
         [13]
         | Ellipsis
         [14]
         | Starred(expr value, expr_context ctx)
         | ...

    [13]
    slice = Ellipsis | Slice(expr? lower, expr? upper, expr? step) 
          | ...
    [15]
    excepthandler = ExceptHandler(expr? type, expr? identifier? name,
                                      stmt* body)
    [1]
    arguments = (expr* arg* args, identifier? arg? vararg, 
                 arg* kwonlyargs, expr* kw_defaults, 
                 identifier? arg? kwarg, expr* defaults)
    arg = (identifier arg, expr? annotation)
    [4]
    withitem = (expr context_expr, expr? optional_vars)
}

In this diff, red represents things removed from Python 2 and green represents things added in Python 3

As you can see, I numerated the changes from 1 to 15:

  1.  Add support for annotations through the creation of arg node with identifier and annotation and the returns parameter of FunctionDef. Also add support for keyword-only arguments.
    # Python 3
    def fn(a: "first argument", b: int, *, c=2) -> "result":
        pass
  2. Add support for keywords, kwargs and star args on classes as base.
    # Python 3
    class MyClass(metaclass=MyMetaClass):
        pass
  3. print is not a statement anymore. Now it is a function.
    # Python 2
    print 2
    # Python 3
    print(2)
  4. Change the way With nodes are parsed. In Python 2, the following code generates two With nodes. However, in Python 3, it generates only one With node and two withitem nodes.
    with open('a', 'w') as f1, open('b', 'w') as f2:
        pass
  5. Change raise syntax to remove comma separated raises. Tuples cannot be substituted for exceptions in Python 3.
    # Python 2
    raise E, V, T
    # Python 3
    raise E(V).with_traceback(T)
  6. Unify try-except and try-finally. In Python 2, the following code generates a TryFinally node inside a TryExcept node, while in Python 3 it generates a single Try node.
    try:
        pass
    except (Exception1, Exception2), target:
        pass
    else:
        pass
    finally:
        pass
  7. exec is no longer a keyword.
    # Python 2
    exec "a = 1" in global_dict, local_dict
    # Python 3
    exec("a = 1", global_dict, local_dict)
  8. Add keyword nonlocal to provide access to names in outer scopes.
    # Python 3
    def outer():
        a = 1
        def inner():
            nonlocal a
            a += 2
        inner()
  9. Add yield from to delegate to a subgenerator.
    # Python 3
    def my_generator():
        yield from other_generator()
    
  10. Remove backticks repr.
    # Python 2
    a = `b`
    # Python 3
    a = repr(b)
    
  11. Make all strings unicode and create new bytes type.
    # Python 2
    a = 's'
    # Python 3
    a = b's'
    
  12. Turn None, True and False into keywords. It used to be possible to assign values to them.
    # Python 2
    True, False = False, True
    
  13. Turn Ellipsis (…) into a genaral expression element. In Python 2, it could only be used inside slices.
    # Python 2
    a = b[...]
    # Python 3
    def func():
        ...
    
  14. Add extended iterable unpacking.
    # Python 2
    first, middle, last = s[0], s[1:-1], s[-1]
    # Python 3
    first, *rest, last = s
    
  15. Change except syntax to remove ambiguity between exception type and exception name.
    # Python 2
    try:
        pass
    except NameError, e:
        pass
    # Python 3
    try:
        pass
    except NameError as e:
        pass
    

Grammar Differences

There are some differences in the grammars that results either in invalid code or different AST for the same code.

  1. Negative numbers in Python 2 are Num nodes. Negative numbers in Python 3 use unaryop -5.
  2. It is not possible to break Ellipsis(…) in Python 3
    # Python 2 / 3
    a[...]
    # Python 2 only
    a[.
      .
      .]
  3. Python 2 creates Name node with value ‘None’ for leading ‘:’ in slices. Python 3 just ignores it. a[2:]
  4. In Python 2, both ‘!=’ and ‘<>’ are mapped to NotEq Node. Python 3 only supports ‘!=’
    # Python 2 / 3
    a != b
    # Python 2 only
    a <> b
  5. col_offset attributes from With nodes are calculated differently.
    # Python 2
    with x as f: # With node: lineno=2, col_offset=5
        pass
    # Python 3
    with x as f: # With node: lineno=2, col_offset=0
        pass
  6. lineno and col_offset attributes from Call Nodes are calculated differently. While Python 2 uses the Expr (name) position, Python 3 uses the ‘(‘ position
    # Python 2
    fn(2) # Call node: lineno=2, col_offset=0
    # Python 3
    fn(2) # Call node: lineno=2, col_offset=2
  7. lineno and col_offset attributes from Attribute Nodes are calculated differently. While Python 2 uses the Expr position, Python 3 uses the identifier position
    # Python 2
    a.b # Attribute node: lineno=2, col_offset=0
    # Python 3
    a.b # Attribute node: lineno=2, col_offset=2
  8. lineno and col_offset attributes from Subscript Nodes are calculated differently. While Python 2 uses the Expr position, Python 3 uses the index position
    # Python 2
    a[
    1] # Subscript node: lineno=2, col_offset=0
    # Python 3
    a[
    1] # Subscript node: lineno=3, col_offset=0