Scala Native allows you to compile your Scala code to a native executable using LLVM. This is good for two reasons:
-
Because your code is compiled instead of running on the JVM, it can run more quickly, depending on your use case. In particular it means your application will not suffer the JVM warm-up overhead when it starts up, making Scala Native suitable for command-line tools.
-
It allows your Scala code to interoperate with C/C++ libraries and other native code. For example you can use the C stdlib to
malloc
andfree
memory.
In this post I'm going to introduce Scala Native by using it to write a simple HTTP server using a C library called libuv. Along the way we'll look at:
- how to work with Scala libraries that have been cross-built for Scala Native
- how to interoperate with C libraries
- how to write unit tests for your Scala Native code
The complete working code is available on GitHub.
Hello world
Before you can use Scala Native, you'll need to install LLVM, plus a couple of other dependencies. On a Mac this is as simple as:
$ brew install llvm bdw-gc re2
The Scala Native team maintain a nice "hello world" giter8 template, so let's use that to set up a skeleton sbt project:
$ mkdir scala-native-webserver
$ cd $_
$ sbt new scala-native/scala-native.g8
The template includes a single Scala file, src/main/scala/Hello.scala
:
object Hello extends App {
println("Hello, World!")
}
Now if you type sbt run
, you should see output like this:
[info] Linking (1878 ms)
[info] Discovered 1293 classes and 9489 methods
[info] Optimizing (2712 ms)
[info] Generating intermediate code (624 ms)
[info] Produced 39 files
[info] Compiling to native code (1349 ms)
[info] Linking native code (179 ms)
Hello, World!
Quite a lot of stuff just happened! The Scala Native compiler plugin and sbt-based tooling took the Scala code, transpiled it into a Scala Native-specific representation called NIR (Native Intermediate Representation), then from NIR into LLVM IR. Then LLVM compiled it into assembly language, i.e. native code, and linked it into an executable binary. Finally the binary was executed and it printed a friendly message to the screen.
This diagram shows the whole pipeline. The blue boxes are tools provided by Scala Native, while the orange ones are part of the LLVM toolchain.
You can run the generated binary directly from the command line if you want to:
$ target/scala-2.11/scala-native-webserver-out
Hello, World!
IntelliJ gotcha
If you open the project in IntelliJ, you'll find that a bunch of stuff from the Scala standard library (e.g. println
) shows as red. This is a known issue.
One workaround is to open up "Project Structure" and manually remove the scalalib
library (the one highlighted in the screenshot below).
Be aware that you will need to do this every time you re-import the project from sbt.
Adding a Scala dependency
Now that we've conquered "hello world", let's get started on building a webserver.
Delete the Hello.scala
file and make a new file webserver/Main.scala
:
package webserver
import scala.scalanative.native._
object Main extends App {
// TODO
}
It would be nice to provide a way for the user to configure the hostname and port to which the server binds. Let's accept the host and port as optional command line arguments.
scopt is my command-line options parser of choice, and fortunately it is cross-published for Scala Native so we can use it.
Add the dependency in build.sbt
like so:
libraryDependencies ++= Seq(
"com.github.scopt" %%% "scopt" % "3.7.0",
)
Note the triple-percent. This means that we need an artifact that is built for both the correct binary version of Scala (2.11.x) and the correct binary version of Scala Native (0.3.x).
Now we can use scopt
just like we would in a normal Scala app:
object Main extends App {
case class ServerConfig(host: String = "127.0.0.1", port: Int = 7000)
private val parser = new OptionParser[ServerConfig]("hello-scala-native") {
opt[String]('h', "host").action((x, c) =>
c.copy(host = x)).text("The host on which to bind")
opt[Int]('p', "port").action((x, c) =>
c.copy(port = x)).text("The port on which to bind")
}
parser.parse(args, ServerConfig()) match {
case Some(serverConfig) => runServer(serverConfig)
case None => System.exit(1)
}
private def runServer(config: ServerConfig): Unit = ???
}
Adding a C dependency
For our low-level TCP I/O we're going to use a C library called libuv. This high-performance event-driven I/O library is most well-known for powering Node.js.
First let's install libuv
. On a Mac, this can be done with Homebrew:
$ brew install libuv
Next we need to write a Scala facade for the libuv
API (or at least the subset of it that we need), so that we can interoperate with it.
Start by creating a new object called uv
to represent the libuv
API:
package webserver
import scala.scalanative.native._
@link("uv")
@extern
object uv {
// TODO data types
// TODO functions
}
Define data types
For all the libuv
-defined data types that we need to use, we'll define a corresponding Scala type
. For example:
// uv_buf_t
type Buffer = CStruct2[
CString, // char* base;
CSize // size_t len;
]
This is a 2-field struct representing a buffer. It has a pointer to the actual bytes of data, and a field containing the number of bytes.
This data type was nice and simple, but for more complex structs with lots of fields and nested structs, defining the corresponding Scala type by hand is pretty tiresome.
It's also quite tricky to define the type correctly, so that each field in your Scala struct type is exactly the right size and thus matches the corresponding field in the C struct. I ended up with crashes at runtime because my struct types were not as big as they should have been, meaning I was not allocating as many bytes as libuv
expected. The library was writing data to bytes that had not actually been allocated.
Define functions
Defining facades for external functions is a lot simpler. You just define the function signature, which must match the signature that the library exposes, and set the right-hand side of the function to extern
:
@name("uv_tcp_init")
def tcpInit(loop: Ptr[Loop], handle: Ptr[TcpHandle]): CInt = extern
You use the @name
annotation to specify what the function is called in the library, so you are free to name your Scala version whatever you like.
Starting the event loop
Let's fill in the runServer
function that we left blank above.
private def runServer(config: ServerConfig): Unit = Zone { implicit z =>
val socketAddress = alloc[sockaddr_in]
uv.ipv4Addr(toCString(config.host), config.port, socketAddress)
val loop = uv.createDefaultLoop()
println("Created event loop")
val tcpHandle = alloc[TcpHandle]
bailOnError(uv.tcpInit(loop, tcpHandle))
println("Initialised TCP handle")
bailOnError(uv.tcpBind(tcpHandle, socketAddress, UInt.MinValue))
println(s"Bound server to ${config.host}:${config.port}")
bailOnError(uv.listen(tcpHandle, 128, Server.onTcpConnection))
println("Started listening")
bailOnError(uv.run(loop, DefaultRunMode))
println("Started event loop")
}
The first thing to notice is that the whole function body is wrapped in a Zone
block. This is so that we can use Scala Native's handy zone allocation feature. Any memory allocated using alloc
inside a Zone block is automatically released at the end of the block, so you don't need to worry about free
ing it manually.
Unfortunately, due to the callback-driven nature of the libuv
API, we won't be able to use zone allocation in the rest of the program. Memory often needs to be allocated in one place and later freed in a callback somewhere else in the program. So we will have to use good old malloc
and free
, and try really hard not to leak any memory!
Hopefully the code above is reasonably self-explanatory. bailOnError
is a tiny helper function I wrote that prints some useful error information and quits the program if the given call to a libuv
function returns a negative status code.
Server.onTcpConnection
is a callback that is called when a client connects to the server. We haven't implemented it yet. Let's do that now.
Accepting a TCP connection
Create a new file in the webserver
package called Server.scala
:
package webserver
import webserver.uv._
import scala.scalanative.native._
object Server {
// Turn a Scala function into a C function pointer
// that can be passed as a callback
val onTcpConnection: CFunctionPtr2[Ptr[TcpHandle], CInt, Unit] =
CFunctionPtr.fromFunction2(_onTcpConnection)
def _onTcpConnection(tcpHandle: Ptr[TcpHandle], status: CInt): Unit = {
println("Got a connection!")
val loop: Ptr[Loop] = (!tcpHandle._2).cast[Ptr[Loop]]
println("Allocating a client handle for the request")
val clientTcpHandle = stdlib.malloc(sizeof[TcpHandle]).cast[Ptr[TcpHandle]]
println("Initialising client handle")
bailOnError(uv.tcpInit(loop, clientTcpHandle))
println("Accepting connection")
bailOnError(uv.accept(tcpHandle, clientTcpHandle))
println("Reading request")
bailOnError(uv.readStart(clientTcpHandle, allocateRequestBuffer, onRead))
}
private val onRead = CFunctionPtr.fromFunction3(_onRead)
private def _onRead(clientHandle: Ptr[TcpHandle],
bytesRead: CSSize,
buffer: Ptr[Buffer]): Unit = {
println("TODO handle the data received from the client")
}
}
The second line of the _onTcpConnection
function deserves some explanation:
val loop: Ptr[Loop] = (!tcpHandle._2).cast[Ptr[Loop]]
tcpHandle
is a pointer to a struct of type TcpHandle
(or uv_tcp_t
in the C API). !tcpHandle._2
deferences the pointer and reads the second field of the struct, which is a pointer to the event loop. Unfortunately due to a known issue in Scala Native (which I don't fully understand), type information about the field is lost, so you need to cast it to convince the compiler of its type.
Also note that we malloc
a handle for the client, but we don't free
it yet. We'll need to remember to free this later when the connection is closed.
You should now be able to run the server and connect to it using curl localhost:7000
. When you connect, you'll see the server print the appropriate log messages to stdout. But because the server doesn't respond with anything yet, curl
will just hang.
Handling the request data
Next we'll implement the onRead
callback to handle incoming data from the client.
This callback will be called one or more times per request. If it's a small request message (say an HTTP GET), it will only be called once, but if the request is large then it will be read in chunks and the callback will be called once per chunk.
For simplicity we'll assume we are only handling requests that can be read in a single chunk. So as soon as we have read the first chunk of data, we'll turn it into a String, parse it as an HTTP request and write an appropriate response.
private def _onRead(clientHandle: Ptr[TcpHandle],
bytesRead: CSSize,
buffer: Ptr[Buffer]): Unit = {
bytesRead match {
case UV_EOF =>
// finished reading the request
// ...snip...
case n if n < 0 =>
// error reading the request
// ... snip ...
case n =>
println(s"Read $n bytes of the request")
val requestAsString: String = readRequestAsString(n, buffer)
println(s"Freeing request read buffer of size ${!buffer._2}")
stdlib.free(!buffer._1)
parseAndRespond(clientHandle, requestAsString)
}
}
// Turn a C array of bytes into a Java String
private def readRequestAsString(bytesRead: CSSize,
buffer: Ptr[Buffer]): String = {
val buf: Ptr[CChar] = !buffer._1
val bytes = new Array[Byte](bytesRead.toInt)
var c = 0
while (c < bytesRead) {
bytes(c) = !(buf + c)
c += 1
}
new String(bytes, Charset.defaultCharset())
}
private def parseAndRespond(clientHandle: Ptr[TcpHandle],
rawRequest: String): Unit = {
// TODO parse HTTP request
// TODO write appropriate response
}
Parsing a request
To parse HTTP requests we're going to use Li Haoyi's FastParse, a nice parsing library that happens to be cross-published for Scala Native.
Given an HTTP request message like:
GET /foo/bar?a=b HTTP/1.1
Host: localhost:7000
User-Agent: curl/7.54.0
Accept: */*
we'd like to parse it into an HttpRequest
instance:
case class StartLine(httpMethod: String,
requestTarget: String,
httpVersion: String)
case class Header(key: String, value: String)
case class HttpRequest(startLine: StartLine, headers: Seq[Header])
Defining the appropriate parsers is pretty simple:
val httpMethod = P ( CharPred(CharPredicates.isUpper).rep(min = 1).! )
val requestTarget = P ( CharsWhile(_ != ' ').! )
val httpVersion = P (
"HTTP/" ~ CharsWhileIn(List('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')).!
)
val startLine: Parser[StartLine] = P(
httpMethod ~ " " ~ requestTarget ~ " " ~ httpVersion ~ "\r\n"
) map StartLine.tupled
val header: Parser[Header] = P (
CharsWhile(_ != ':').! ~ ": " ~ CharsWhile(_ != '\r').! ~ "\r\n"
) map Header.tupled
val startLineAndHeaders = P (
startLine ~ header.rep
) map HttpRequest.tupled
Now we have a parser, we can fill in the parseAndRespond
function. Here is where a real server would do something useful, e.g. serve a file from the filesystem. But for the sake of demonstration we'll just parrot back some information that we've parsed from the request.
private def parseAndRespond(clientHandle: Ptr[TcpHandle],
rawRequest: String): Unit = {
val responseText = Http.parseRequest(rawRequest) match {
case Right(parsedRequest) =>
val entity = "<big blob of HTML, omitted for brevity>"
s"""HTTP/1.1 200 OK\r
|Connection: close\r
|Content-Type: text/html; charset=utf8\r
|Content-Length: ${entity.length}\r
|\r
|$entity""".stripMargin
case Left(_) =>
ErrorResponse // return a stock 400 Bad Request response
}
writeResponse(clientHandle, responseText.getBytes(StandardCharsets.UTF_8))
}
Testing with μTest
Because the HTTP request parsing takes a Java String as input and doesn't do any C interop, it's easy to unit-test.
For testing we'll use μTest (also made by Li Haoyi), a minimal testing framework that works with Scala Native.
By now our build.sbt
looks like this:
scalaVersion := "2.11.12"
enablePlugins(ScalaNativePlugin)
nativeLinkStubs := true
libraryDependencies ++= Seq(
"com.github.scopt" %%% "scopt" % "3.7.0",
"com.lihaoyi" %%% "fastparse" % "1.0.0",
"com.lihaoyi" %%% "utest" % "0.6.3" % "test"
)
testFrameworks += new TestFramework("utest.runner.Framework")
Our unit tests for request parsing look like this (sorry for the broken syntax highlighting):
object HttpTest extends TestSuite {
val tests = Tests {
'parseValidHttpRequest - {
val raw = "GET /foo/bar?wow=yeah HTTP/1.1\r\nHost: localhost:7000\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\n\r\n"
val parsed = Http.parseRequest(raw).right.get
assert(parsed.startLine == StartLine("GET", "/foo/bar?wow=yeah", "1.1"))
assert(parsed.headers.toList == List(
Header("Host", "localhost:7000"),
Header("User-Agent", "curl/7.54.0"),
Header("Accept", "*/*")
))
}
'parseInvalidHttpRequest - {
assert(Http.parseRequest("yolo").isLeft)
}
}
}
You can run them using sbt test
.
Writing the response
All that remains is to write the response, close the connection and (hopefully) free all the memory we've allocated.
private def writeResponse(clientHandle: Ptr[TcpHandle],
responseBytes: Array[Byte]): Unit = {
println(s"Allocating a buffer for the response (${responseBytes.length} bytes)")
val responseBuffer = stdlib.malloc(responseBytes.length)
var c = 0
while (c < responseBytes.length) {
responseBuffer(c) = responseBytes(c)
c += 1
}
println("Allocating a wrapper for the response buffer")
val buffer = stdlib.malloc(sizeof[Buffer]).cast[Ptr[Buffer]]
!buffer._1 = responseBuffer
!buffer._2 = responseBytes.length
println("Allocating a Write for the response")
val req = stdlib.malloc(sizeof[Write]).cast[Ptr[Write]]
// Store a pointer to the response buffer in the 'data' field
// to make it easy to free it later
!req._1 = buffer.cast[Ptr[Byte]]
bailOnError(uv.write(req, clientHandle, buffer, 1.toUInt, onWritten))
}
private def _onWritten(write: Ptr[Write], status: CInt): Unit = {
println(s"Write succeeded: ${status >= 0}")
val buffer = (!write._1).cast[Ptr[Buffer]]
println(s"Freeing the response buffer (${(!buffer._2).cast[CSize]} bytes)")
stdlib.free(!buffer._1)
println("Freeing the wrapper for the response buffer")
stdlib.free(buffer.cast[Ptr[Byte]])
val clientHandle = (!write._6).cast[Ptr[TcpHandle]]
uv.close(clientHandle, onClose)
println("Freeing the Write for the response")
stdlib.free(write.cast[Ptr[Byte]])
}
All together now
No doubt you're pretty lost after reading through all these callback-riddled code snippets. It might make more sense if you look at the complete working code, which is available on GitHub.
It works!
If you run the app and access localhost:7000
in your browser, you should see something like this.
Summary
If you value the safety of strong static types but you also enjoy shooting yourself in the foot by messing up your pointer arithmetic, then Scala Native is the project for you! Just kidding. This post was a lot of fun to write, and I can really see the potential of Scala Native.
The user experience is a little rough around the edges at the moment (e.g. I encountered a lot of incomprehensible stack traces from the compiler plugin and had to solve them pretty much through trial and error). But the fact that it works at all is pretty miraculous, and is testament to the amount of hard work put into the project by Denys Shabalin and the rest of the Scala Native contributors. I'm sure as time goes by and the project matures, it will grow into a pillar of the Scala ecosystem.