When the Scala compiler doesn't help
May 2, 2017Much of Scala’s power comes from its flexibility and generality as a language. You can mold it quite extensively to suit your particular problem domain or coding style.
The downside of this is that Scala can sometimes be permissive and accommodating to a fault. You often hear Haskell or F# users attest to a sense of “if it compiles, it works” – in my experience this is not generally the case with Scala.
To illustrate this point, let’s walk through a few examples, all of which are distilled from honest-to-goodness I-swear-I’m-not-making-this-up bugs my team or I have encountered.
Example 1
Setup
The following nonsensical code compiles with no warnings, no errors:
val (a, b, c) =
if (foo) {
"bar"
} else {
Some(10)
}
Bug
This will crash with a MatchError
at runtime. Always. It’s super broken. Why does
the compiler let it through?
Because when an expression has multiple return branches, Scala tries to be helpful, by picking the first common ancestor type of all the branches as the type of the whole expression.
In this case, one branch has type String
and the other has type Option[Int]
, so the compiler
decides that what the developer really wants is for the whole if/else
expression to have type Serializable
,
since that’s the most specific type to claim both String
and Option
as descendants.
And guess what, Tuple3[A, B, C]
is also Serializable
, so as far as the compiler is concerned,
the assignment of the whole mess to (a, b, c)
can’t be proven
invalid. So it gets through with nary a warning, destined to fail at runtime.
Example 2
Setup
I had dashed off some tests that more or less boiled down to something like this, which compiled with no errors or warnings:
"foobar".toList == List('f','o','o','b','a,'r')
Bug
This comparison was, surprisingly, returning false
, and I couldn’t figure out why. Can you?
With syntax highlighting, the problem is clearer:
"foobar".toList == List('f','o','o','b','a,'r')
You see the character a
in the list? There is a typo – I meant to type 'a'
but I missed a
quotation mark and typed 'a
.
As it turns out, the syntax 'blah
is actually valid in Scala, and represents a symbol literal.
Now recall the “helpful” compiler behavior mentioned in Example 1. Instead of yelling at
me for putting a Symbol
in the middle of my Char
list, the compiler generalizes and assumes
what I really wanted from the start was a List[Any]
, as Any
is the first common ancestor
of Char
and Symbol
.
And comparing a List[Char]
with a List[Any]
also raises no objections, because in theory
it’s not impossible for it to work.
Example 3
Setup
Here’s a simple method definition on a class. Works fine.
class Widget {
def doTheThing() =
synchronized {
...
}
}
Somebody makes a small change - adds a log line at the start of the method.
class Widget {
def doTheThing() =
info("Doin' that sweet thing")
synchronized {
...
}
}
Jolly good, doesn’t get much simpler than that. I think most code reviewers would glaze over it in the middle of a big PR. The compiler (the best code reviewer) is also 100% happy with this change and raises no objections.
Bug
Three of Scala’s features come into play:
- Method bodies without
{ }
brackets can only consist of a single expression - Any statements or expressions in a class definition, outside of member definitions, become part of the primary constructor.
- Member definitions and primary constructor content can intermingle with no ordering restrictions.
So the resulting code really acts like this:
class Widget {
// executes within Widget constructor
synchronized {
...
}
def doTheThing() = {
info("Doin' that sweet thing")
}
}
What was previously the body of the doTheThing
method has been silently bumped into the body
of the primary constructor. Oops!
Example 4
Setup
Here’s another example of “I’ll just add one thing” gone awry.
Quick background: Scala has a shorthand syntax _
to represent an anonymous argument
in a lambda function. A number of languages (Perl, Mathematica, Powershell to name a few) have a similar capability.
This allows one to use the following shorthand, which is actually quite nice:
val lst = List(1,2,3)
// these lines can be re-written
lst.map { i => i + 1 }
lst.foreach { i => println(i) }
// like this
lst.map { _ + 1 }
lst.foreach { println(_) }
So let’s say you come across some code like this, and you want to add one little thing: a counter in the loop.
val lst = List(1,2,3)
// let me just add one thing
lst.foreach { println(_) }
// yay
var j = 0
lst.foreach { j += 1; println(_) }
println(s"Looped $j times")
Compiler is happy, looks good right?
Bug
Not so fast. This code will print the following:
1
2
3
Looped 1 times
Why is the counter only incremented once if we are looping over a 3-element list?
Because { j += 1; println(_) }
is not a lambda, it is an expression that increments j
then
yields a lambda, one which does nothing but invoke println
. So the function argument passed to foreach
is just the lambda
that invokes println
, the j += 1
part is only executed once along the way to obtaining this argument.
In very explicit pseudo-code, this is the equivalent of
var j = 0
val f = {
j += 1
return { i => println(i) }
}
lst.foreach(f)
Note that everything would have worked fine if the shortcut syntax was avoided, because the “full-fledged” lambda syntax does capture the whole of the bracketed code into the lambda.
// works as expected
var j = 0
lst.foreach { i => j += 1; println(i) }
println(s"Looped $j times")
Example 5
Setup
This one’s a twofer. Consider the following two test cases, executed with ScalaTest.
def bar() {
"bar"
}
// test 1
("foo" + bar() + "baz") should be("foobarbaz")
// test 2
"short string" should be
("unequal long string that was moved to its own line because it's long")
}
For those not familiar with ScalaTest, using should be
matchers just means “LHS should be
equal to RHS, fail if not.” So given the code above I would think test 1 would pass and test 2 would
fail.
Bug
If you compile and run this, you will find that exactly the opposite happens: test 1 fails while test 2 passes.
The cause of both unexpected results is the same – stray expressions in statement position.
In test 1, bar()
actually returns Unit
, not String
as you might assume from quick inspection. When you declare a Scala
method without a =
(i.e. def bar() { ...
like in the example, as opposed to def bar() = { ...
) that indicates to
the compiler that the method has Unit
return type (i.e. it’s a “void” method). So the
"bar"
string is simply ignored, Unit
is returned, and the LHS string of the test is in
fact "foo()baz"
. Test fails.
In test 2, we see another manifestation of the same problem. Due to the implementation details
of ScalaTest and its matchers DSL, the code "short string" should be
actually represents a complete expression,
whose type is some kind of curried lambda function that expects a RHS value. Nonetheless, it’s
a complete expression in statement position, so its value just gets thrown away. Similarly, ("unequal long string ...")
is a
complete, valid string expression, which also gets ignored. Thus this “test” is really just
two expressions that do nothing, and that’s considered a “pass.” (Also worth noting that
sometimes Scala expressions can flow across line breaks, so it’s not too unreasonable to see this
during a review and assume it works)
To be fair – in this example, both the bar
method and the first line of test 2
trigger warnings from the compiler. It’s certainly good
that warnings are issued, but I’ll explain why that’s not enough in the next section.
Root causes
I see 3 basic themes in these examples, each representing an aspect of Scala I don’t personally care for.
The type system feels very loose
Due to the kind of automatic generalization demonstrated above, I never feel fully confident that the compiler has my back. I’m always wondering what I screwed up, type-wise, that the compiler isn’t telling me about. Call me old-fashioned, but isn’t type safety supposed to be one of a compiler’s strong suits?
I readily acknowledge that some of the more powerful type system capabilities Scala offers hinge on this behavior, and that some devs fully rely on it for more advanced usages. I just personally prefer the safety of something stricter.
Too much syntax is optional
On the one hand, Scala’s syntax is “dynamic” and “flexible.” Scala is great for crafting DSLs, and it accommodates a wide range of coding styles.
That sounds nice in theory, but I’ve found that in practice it leads to headaches. In a team setting, everybody ends up writing their own personal brand of Scala code, and it takes constant policing to maintain a uniform style. With so much variation in syntax, code reviews become more difficult since the visual patterns you are accustomed to from your own syntactic style don’t neccessarily carry over to what you’re reviewing. And the rules are such that mistakes don’t always result in errors - they might just shift you into another supported form.
Example 3 would have been prevented if methods were simply required to have { }
, or
if primary constructor content was required to come before method definitions.
Example 4 would have been prevented if there was a single syntax for lambda functions, or if
the rules for _
, { }
, and ;
weren’t so subtle and overloaded.
The first part of Example 5 would have been easier to catch if return type annotations
and/or the =
in method definitions were required.
The second part of Example 5 would have been prevented if methods required ( )
around
their arguments, or if ScalaTest wasn’t inviting people to use a magical DSL that nobody
can actually reason about.
There is no scoped “nowarn” pragma
Like most languages, Scala has a “fatal warnings” flag which will promote warnings into errors. But it doesn’t have any way to suppress individual warnings.
The result is an “abstinence only” kind of situation – you either have to commit to never introducing a single warning of any kind, or else you can’t benefit from fatal warnings at all.
Here in the real world, it’s to be expected that a big project will pick up a warning here, a deprecation there. These useful warnings that indeed should be fixed, but for whatever reason the team decides they can’t or won’t be fixed yet.
Ideally one could suppress those particular issues while maintaining the protection that fatal warnings provide against new problems. Instead, we are left with a situation where the build log is already sullied with a bunch of (known) warnings, and nobody notices when new ones (e.g. Example 5) are introduced.
[Update]
Interesting discussion on reddit.
There are a number of comments along the lines of “just don’t rely on type inference,” “just use a linter,” “nobody uses that syntax anyway,” etc. I fully agree! We are indeed looking to implement these kind of things in our CI system and style guide, and I expect those efforts to help.
These comments are making my broader point for me, though – you get a remarkably flexible and powerful language in return, but using Scala safely and confidently means avoiding flagship language features, opting in to experimental compilation flags, and maintaining a small constellation of 3rd-party plugins.