My Impressions and Notes from VoxxedDays 2017
Encodo presented a short talk at Voxxed Days 2017 this year, called The truth about code reviews. Sebastian and I also attended the rest of the conference. The following is a list of notes and reactions to the talks.
The keynote was about our place in the history of software engineering. Martin described us more as alchemists than engineers right now, a sentiment with which I can only agree. There is too little precision, too little reproducibility and too little focus on safety for use to qualify as engineers.
He gave as an example the pride with which car companies brag about the hundreds of millions of lines of code they have running in software in their cars: a claim that should send shivers down your spine. We know how this software is written and how it is tested.
Quino has fewer than 100,000 lines of code (about 85,000, at least 15% of which is obsolete) and we’ve been building that for almost 10 years. How a company whose main business is building automobiles guarantees safety and correctness of 300 million lines of code is beyond my comprehension. I would venture that they don’t.
Highly recommended talk. Very interesting. Lots of good history mixed with common-sense recommendations, like the following:
- Code reviews
- Iterative design
- Be an engineer, not an alchemist
- SOFTWARE ENGINEERING: Report on a conference sponsored by the NATO SCIENCE COMMITTEE Garmisch, Germany, 7th to 11th October 1968 (PDF)
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems (PDF)
He discussed a proof-of-concept transport-tracking application. Uses the SBB REST API for vehicle positions (using the same API as exposed for the app). Then there is the OpenData Transport API for station-board information, which provides details about delays. Everything is available as JSON with relatively straightforward data models.
Uses Kafka to handle this real-time data pipeline (kind of like Chronicle, RabbitMQ or EasyMQ, but from Apache). The pipeline includes reformatting the data into the desired format (mostly eliding unwanted data), then store them in LogStash and then to ElasticSearch, which allows easy querying of the stored data. This type of data isn’t fundamentally relational, so a document-based store is appropriate.
The transformation also involves extrapolating the data that you’re interested in from the data you obtained. For example, determining whether a train is stopped. E.g. are there x events with the same position? Is the position near a station?
It was developed in Scala with Akka actors as well as the Play framework for REST. They represented all stations and trains with actors (objects). The actors are async and can run on any number of machines.
After that comes Cassandra? Are they trying to use every possible technology? I’m losing track over here. Deployment on Docker. Also uses Zookeeper in another container for load-balancing/redundancy. OMG buzzwords.
He asks: Why not a single application on a single server? Classic Java on Tomcat? It doesn’t scale. It can only scale up, but not out. The actual solution feels like a lot of moving parts, but each part does a compartmentalized task, handing off to the next piece. It ends up being quite lightweight, using very little CPU overall.
The simple, one-use components scale natively and relatively easily (LogStash, streaming, docker). The app server using Akka can be scaled, but it’s here that you have to invest time to use the available fallback and clustering strategies.
To render the data on the map, they used React to manage the data and d3.js to render. React is fast and scalable (but as Encodo has also discovered, that’s not free either). Also, the client-side CPU usage is not insignificant, even with a lot of nodes.
He also discussed UX and UI with tests. How to visualize possibly overlapping and differently sized elements at different zoom levels.
Used Jupiter to analyze data and produce graphs.
Conclusion: offload the parts of your application that aren’t your core problem to external software and services. Things like managing data streams, transforming data, etc. Focus on your models and analyzing your data.
- Visualizing massive data streams: a public transport use case
- D3.js transitions killed my CPU! A d3.js & pixi.js comparison
- An invitation to reproducible computational research
He discussed how to build reusable structures that don’t share mutable state (non-imperative vs. functional).
- Classic standard libraries define mutable data structures, like lists, arrays, etc. These are not optimal for multi-tasking and asynchronous work. Mutable data structures produce side effects.
Voidis a “code smell” because the only reason to call it is to cause a side effect. Prefer pure methods.
- A functional data structure has to be immutable.
- A functional data structure has to be persistent. This is similar to the first property, but it allows for a new structure to be created that is a mutation of the prior version. Obviously, we want to optimize storage here, reusing as much of the prior version as possible (instead of copying).
- This is how mutation works, since we know that the prior version will never change, so it can be freely referenced.
- Return values from methods on functional data structures are referentially transparent. You can cache the value without worrying that it will ever change or disappear.
- This allows an application to work lock-free instead of guarding access to all possibly mutating methods.
- It is easier to reason about functional (pure) data structures.
Any discussion of data-structure design/implementation will naturally involve balancing performance vs. storage. The safety is baked-in, but performance is always a concern when working with immutable data structures, most especially when changing them.
Even though the average call time for a method is nearly constant (as with most mutable structures), what if you call too many expensive operations and skew the average in real-world use? Well, you can combat this by leveraging the cachability of your collections (as defined above) as a way of memoizing (a well-known performance-optimization technique which carries with it possibly higher storage costs if you can’t share the memoized instances very much.)
In some cases, you can reason about performance in the following way: if you get to a situation where you would have to do an expensive operation (e.g. the reverse implicit in balancing head/tail of a queue), you can only get to this situation by having done n cheap operations first. So it is proven that the average is still constant time.
Destructive behavior (like
deque) looks different than mutable data structure. In those cases, the operation returns both the removed element as well as a reference to the queue that represents the new state of the queue.
Tuple<T, Queue<T>> Dequeue();
For maps, you need a concept called
Zip that lets you quickly build a representation of the structure where the element viewed at a particular point in an existing structure is different. So even when a desired mutation would require alteration of a lot of the underlying structure, this operation allows reuse of a lot more of the structure than would otherwise be possible. The node can point to different parent and child nodes, referencing the new part of the structure while embedded in as much of the prior version as possible.
“Object-oriented programming makes it easier to reason about moving parts. FUnctional programming makes it easier to minimize moving parts.”
This talk began by posing the following questions to the audience.
- Do you work with women in a technical capacity? (My answer: No. The closest I’ve come was a programmer I trained as part of a group of 7 others for a customer.)
- Can you remember having been in a meeting with two women or more? (My answer: A couple of project meetings over the last couple of years, but no-one in a technical capacity. Also some con conference calls, but neither of the two female participants were in a technical capacity.)
Good questions. Good topic. Mostly well-presented, although the middle dragged a bit: Sombra envisioned a (near-)future where women are the same as men in a tech world, a meritocracy. It didn’t add very much.
As with everywhere else, the software industry has to figure out how to deal with long maternity leaves. Some countries have introduced “rainbow” leaves, which allow sharing of the time between partners, so if the partner is male, the industry has to deal with male absence as well. That will probably help increase acceptance of female leave, as it removes the distinction.
For small companies, these kinds of extended leaves are a big hurdle because we can’t so easily absorb so much missing capacity.
We haven’t improved at all in the last quarter-century: there have been proportionally fewer women in technical software positions every year since 1991. The quit rate is much higher (41%) than for men (17%). This is not primarily due to family concerns, though. It’s mostly due to women not feeling comfortable in an industry where they’re often the only female in a meeting, on a team or in a company.
This talk is a reduced version of the code-review talk that Sebastian has been doing for Encodo Systems in both English and German over the last year.
The presentation includes some statistics about the value of code reviews, a discussion of which benefits you can expect to get, which types of reviewers are likely to yield which benefits as well as Encodo’s approach and advice for integrating code reviews into your development process.
This was the most informative and amazing presentation at the entire show. All kidding aside, the room was packed and the ratings were quite good. There seemed to be a lot of interest in process.
This guy was supremely entertaining. He is the undisputed master of the animated and reaction GIF in presentations. Informative, spirited and very funny.
- SQL is a 4GL.
- It’s a declarative language.
- He shows off with a calculation of the Mandelbrot set with PostgreSql (but that’s not the presentation)
- He presents an example of how to address business needs (e.g. how much money per film per day). Shows how simple joins are in SQL
- Then he shows how to do it with classic Java (which sucks). Basically, he shows how good SQL is by slagging on Java
- He shows something that could be on Annotatiomania
- At this point, the Java code is so long that “they can see our code from space.”
- Eager-loading is a code smell. You should actually be able to get the objects that you want in the form that you want. The optimal result type is the exact shape of the data that you want, not the ORM objects.
- “When does that ever happen? Changing requirements. Never.”
- He’s talking about using SQL instead of code because you don’t care about algorithms or storage types or caches—let the database developers worry about that. They’re good at it. And they love it. And the questions that you’re answering are higher-level.
- He discusses how Java streams look much, much nicer. But I think .NET Linq is even nicer … and he doesn’t mention that at all. So he shows how the more readable API is much better in Java … but it’s now exactly how SQL works.
- The Java is now more readable, but it’s still lazy-loading a ton of data you don’t need. You’re doing stuff on the client that the database would do much better.
- Java Streams are so much uglier than Linq. They are forced to use explicitly typed Tuples (because there is still no
var) and the tuple elements are unnamed (
p2, etc.) C# 6 is still like this, but C# 7 introduced named items for anonymous tuples.
- The general-purpose languages force us to think about these things when they are not our programming domain.
- Database also is capable of caching execution plans and optimizing subsequent queries. Prepared queries are da bomb.
- Any algorithm that produces the correct result is acceptable. It doesn’t matter how you get there.
- We don’t really know what the database is really doing with a declarative syntax. We probably can’t even guess at the optimizations that the clever database is doing. Use “Explain Query” to see the estimated plan and the actual plan (based on the actual data and current situation on the database instance).
- The cardinality is a hint that indicates whether to use linear or logarithmic algorithm. This will also give a hint as the order that the database will load data (e.g. to reduce the dataset as quickly as possible before applies further joins, ordering and restrictions).
- Conclusion: let the database choose the algorithm based on the dataset available and the current state of the database. He shows an example with a histogram: how a query with one filter might use an index whereas a different filter might be more efficient just scanning the whole table (because 90% of the data is required anyway). The database can take disk-access speed into account. How can the developer predict which algorithm to use? The data and deployment environment isn’t known at compile-time. So since you can’t know and you’re not the guy to decide, then you should offload that decision to the software that does know: the SQL database. What about latencies for remote data? Same thing. Let the database decide.
- Unless you’re the one writing the database.
- The database is really good at this. It remembers how well its estimate matched the actual cost and it uses this to improve its execution plan.
- Oracle can actually change the execution plan “in flight” if it sees during execution that an assumption was grievously wrong.
- Also, SQL is functional: no side effects.
- “Coders want to code; they want to do everything themselves.”
- Also: use production data whenever possible so you have a commensurate dataset size.
- Joolambda is a product from his company. Also JOOQ. Looks like Linq, actually. But maybe it works better? Nice clean API which works with arbitrary result sets instead of fixed ORM objects. Can Quino learn something from it? I asked him after the talk where JOOQ gets its metadata and it generates it from the database schema.
- Without hash joins there are a whole lot of algorithms that aren’t available (MySQL).
- Put business logic in the database, but be careful because how do you test it? I talked to Martin, Vlad and Lukas after the talk about testing and we agreed that databases should be immutable (Martin forbids the
UPDATEstatement in his projects, where he can) and then you basically have an immutable data structure in a separate process with a really powerful and efficient query languages over the graph.
- Locks are terribly complicated and performance is unpredictable. He says he’s “lucky to only work with read-only databases. So much easier. So much fun.”
- Summary of chat after talk (some repeat from above): Chatted with Martin, Vlad and Lukas after the talk about testing the database. Martin suggested that you don’t use the update statement, only insert. Lukas responded similarly, saying that we should use SQL for read-only logical queries. Jooq has a metadata generator for analyzing your database so that you can query it. It doesn’t define objects; you can only define the Tuple that you will return. That is pretty cool. Martin also pointed out that you could enforce immutability and store your data in an immutable, queryable graph by using the database.
Category theory is about Monads, examples of which are
The example she uses shows how to apply category-theory constructs to data-validation. The examples are in Scala, although the API that she presents looks very similar to the terminology used in Java’s Streams API. E.g.
Select() for C# developers. Similarly,
Nullable, although I can’t think of the type analog for
Her validation example is well-made, going from returning an
Option which is no better than a
Boolean. Then she shows an
Either but that doesn’t allow for having both sides wrong. This can be done with
Either but it’s painful. That’s why we invented pattern-matching (now available in C# 7).
When she introduced a
Validated, which is capable of returning a list of errors. “Focus on how things compose.”
The talk was quite short and didn’t introduce much new. The pattern-matching syntax in Scala is a bit wordy.
Since my previous talk was done early, I joined Sebastian in this one. I saw only the tail-end of it, but man are the streams() libraries still really wordy. Welcome to functional programming, Java! Still, I’m disappointed that I can’t use
streams() in the Android project I’m working on because it required Java 8, which forces API level 24, which excludes a lot of devices.
Sebastian said the talk was pretty good.
- She’s from Lagos, Nigeria. Google talk something or other.
- Nicest slides I’ve seen all day.
- Graceful degradation is the solution for only the current best browser. It doesn’t necessarily scale to future versions.
- Most designers test only one version older than the supported version. Encodo tests the versions required in the spec.
- The goal isn’t to dazzle the user, but to deliver the information to the widest possible audience.
- Admittedly, some sites do have “dazzle the user” as a goal
- Use sparse, semantic markup
- Use plain text for the content in the markup
- The basic layout should work without CSS
- CSS is an enhancement
- End-user browser preferences should be respected (e.g. don’t restrict zooming the UI, since a lot of users can’t see so well)
WTF is the squirrel browser? (It turns out it’s UC Browser, popular in China.) Or the one with the strange globe? (Maybe Flock? Not sure.) Does Opera really have higher market-share than IE? Probably globally, right? Phone browser in India/China/etc.
She showed a really cool graph of how many hours you have to work to use 500MB of data. Germany: 1h, Brazil: 56h, US: 6h. Bandwidth matters. A lot. WWW != Wealthy Western Web ammirite?
- Use aria rules if you know that you might run on browsers that don’t understand the new tag types (increasingly unlikely). Still, phone browsers in Africa probably have never heard of
- There is no difference between an unsupported CSS property or a bad value or name in the style, selector, etc.
- CSS doesn’t have built-in fallbacks
- Start with sensible HTML (same as above)
- Go “Mobile-first”
- Use media queries
- Use flexbox (was designed as a progressive enhancement, so
vertical-alignis ignored when flexing is enabled.)
- What about “Offline-first”? That is, making sure that your app works offline to at least some degree. Syncing data can be a pain, depending on the data, though. If you just have data to log, that’s independent of other data, it’s OK.
- Use CSS Feature queries (detect support or NOT support)
- Use progressive enhancement to deal with IE, which doesn’t support feature queries
- A good tip is that a property with a bad value is ignored by the browser.
What about the future of the web? VR? Old devices handed down from the 1st to the 3rd world.
I asked about testing that the progressive enhancements work as programmed, but no-one has any new ideas for testing, though. Manual testing to verify that the enhancements and fallbacks work.
He started off the talk as a bandit, reverse-engineering a Base64-encoded name/password. He used Charles to get MITM. It was a nice trick, and it probably works on a lot of devices and apps.
It’s very easy to make a hackable application if you don’t think about security. He uses a nice word-definition slide with pronunciation and usage to make it look all official.
- Pace is a bit slow at first.
- Pokemon Hack was a MITM; it wasn’t malicious: kids just didn’t feel like walking. Important to remember that if motivation is high, a hacker will try really hard.
- Beer hack was a loyalty hack (Kuba Gretsky)
- Encrypt all the values instead of sending plain-text
- But be careful of where you put your keys
- This one guy Luke Chadwick uploaded his Amazon key to GitHub by accident. Farmers who watch every damned public commit got it, spun up some EC2 instances and started mining BitCoins
- Use security features where possible
- Use certificate pinning with the
- Do NOT trust the device
- Do NOT trust the app; it can be decompiled.
- What about magic strings?
- You can get your keys from a server
- Or you can encrypt them, but what about the encryption key for the encryption key?
- Get the key from the NDK. You can store information in the NDK itself, which is more secure and less decompilable than app code.
- Check that the application name hasn’t been changed.
- Check that the package manager is supported/correct; otherwise, your app has been republished to a new server.
- Or you can also check that the installer is Google or Amazon
- Check your application signature; you can check whether the app was actually compiled by YOU
- Check if the device is rooted (he used some exec() command)
- Check for emulator (If the build fingerprint starts with “emulator”)
- Do not allow users to switch your App into debug mode
- Use ProGuard, DexGuard. ProGuard is the lite version. DexGuard supports a lot of the checks listed above. DexGuard uses non-Latin Unicode for obfuscation. :-) Unfortunately, it’s a per-user charge. That’s per user of your app.
- So, SafetyNet it is! That’s more like it.
- “The Internet is not a Safe Place” (shows a slide of a dirty van with “Hannah Montana Concert Shuttle” sprayed on the side)
- Try to hack your own applications. Always.