You are here

The Observation Deck - Bryan Cantrill

Subscribe to The Observation Deck - Bryan Cantrill feed
Views on software from Bryan Cantrill's deck chair
Updated: 22 hours 40 min ago

Assessing software engineering candidates

Fri, 10/05/2018 - 20:02

Note: This blog entry reproduces RFD 151. Comments are encouraged in the discussion for RFD 151.

How does one assess candidates for software engineering positions? This is an age-old question without a formulaic answer: software engineering is itself too varied to admit a single archetype.

Most obviously, software engineering is intellectually challenging; it demands minds that not only enjoy the thrill of solving puzzles, but can also stay afloat in a sea of numbing abstraction. This raw capacity, however, is insufficient; there are many more nuanced skills that successful software engineers must posess. For example, software engineering is an almost paradoxical juxtaposition of collaboration and isolation: successful software engineers are able to work well with (and understand the needs of!) others, but are also able to focus intensely on their own. This contrast extends to the conveyance of ideas, where they must be able to express their own ides well enough to persuade others, but also be able to understand and be persuaded by the ideas of others — and be able to implement all of these on their own. They must be able to build castles of imagination, and yet still understand the constraints of a grimy reality: they must be arrogant enough to see the world as it isn’t, but humble enough to accpet the world as it is. Each of these is a balance, and for each, long-practicing software engineers will cite colleagues who have been ineffective because they have erred too greatly on one side or another.

The challenge is therefore to assess prospective software engineers, without the luxury of firm criteria. This document is an attempt to pull together accumulated best practices; while it shouldn’t be inferred to be overly prescriptive, where it is rigid, there is often a painful lesson behind it.

In terms of evaluation mechanism: using in-person interviewing alone can be highly unreliable and can select predominantly for surface aspects of a candidate’s personality. While we advocate (and indeed, insist upon) interviews, they should come relatively late in the process; as much assessment as possible should be done by allowing the candidate to show themselves as software engineers truly work: on their own, in writing.

Traits to evaluate

How does one select for something so nuanced as balance, especially when the road ahead is unknown? We must look at a wide-variety of traits, presented here in the order in which they are traditionally assessed:

  • Aptitude
  • Education
  • Motivation
  • Values
  • Integrity
Aptitude

As the ordering implies, there is a temptation in traditional software engineering hiring to focus on aptitude exclusively: to use an interview exclusively to assess a candidate’s pure technical pulling power. While this might seem to be a reasonable course, it in fact leads down the primrose path to pop quizzes about algorithms seen primarily in interview questions. (Red-black trees and circular linked list detection: looking at you.) These assessments of aptitude are misplaced: software engineering is not, in fact, a spelling bee, and one’s ability to perform during an arbitrary oral exam may or may not correlate to one’s ability to actually develop production software. We believe that aptitude is better assessed where software engineers are forced to exercise it: based on the work that they do on their own. As such, candidates should be asked to provide three samples of their works: a code sample, a writing sample, and an analysis sample.

Code sample

Software engineers are ultimately responsible for the artifacts that they create, and as such, a code sample can be the truest way to assess a candidate’s ability.

Candidates should be guided to present code that they believe best reflects them as a software engineer. If this seems too broad, it can be easily focused: what is some code that you’re proud of and/or code that took you a while to get working?

If candidates do not have any code samples because all of their code is proprietary, they should write some: they should pick something that they have always wanted to write but have been needing an excuse — and they should go write it! On such a project, the guideline to the candidate should be to spend at least (say) eight hours on it, but no more than twenty-four — and over no longer than a two week period.

If the candidate is to write something de novo and/or there is a new or interesting technology that the organization is using, it may be worth guiding the candidate to use it (e.g., to write it in a language that the team has started to use, or using a component that the team is broadly using). This constraint should be uplifting to the candidate (e.g., “You may have wanted to explore this technology; here’s your chance!”). At Joyent in the early days of node.js, this was what we called “the node test”, and it yielded many fun little projects — and many great engineers.

Writing sample

Writing good code and writing good prose seem to be found together in the most capable software engineers. That these skills are related is perhaps unsurprising: both types of writing are difficult; both require one to create wholly new material from a blank page; both demand the ability to revise and polish.

To assess a candidate’s writing ability, they should be asked to provide a writing sample. Ideally, this will be technical writing, e.g.:

  • A block comment in source code
  • A blog entry or other long-form post on a technical issue
  • A technical architectural document, whitepaper or academic paper
  • A comment on a mailing list or open source issue or other technical
    comment on social media

If a candidate has all of these, they should be asked to provide one of each; if a candidate has none of them, they should be asked to provide a writing sample on something else entirely, e.g. a thesis, dissertation or other academic paper.

Analysis sample

Part of the challenge of software engineering is dealing with software when it doesn’t, in fact, work correctly. At this moment, a software engineer must flip their disposition: instead of an artist creating something new, they must become a scientist, attempting to reason about a foreign world. In having candidates only write code, analytical skills are often left unexplored. And while this can be explored conversationally (e.g., asking for “debugging war stories” is a classic — and often effective — interview question), an oral description of recalled analysis doesn’t necessarily allow the true depths of a candidate’s analytical ability to be plumbed. For this, candidates should be asked to provide an analysis sample: a written analysis of software behavior from the candidate. This may be difficult for many candidates: for many engineers, these analyses may be most often found in defect reports, which may not be public. If the candidate doesn’t have such an analysis sample, the scope should be deliberately broadened to any analytical work they have done on any system (academic or otherwise). If this broader scope still doesn’t yield an analysis sample, the candidate should be asked to generate one to the best of their ability by writing down their analysis of some aspect of system behavior. (This can be as simple as asking them to write down the debugging story that would be their answer to the interview question — giving the candidate the time and space to answer the question once, and completely.)

Education

We are all born uneducated — and our own development is a result of the informal education of experience and curiosity, as well as a better structured and more formal education. To assess a candidate’s education, both the formal and informal aspects of education should be considered.

Formal education

Formal education is easier to assess by its very formality: a candidate’s education is relatively easily evaluated if they had the good fortune of discovering their interest and aptitude at a young age, had the opportunity to pursue and complete their formal education in computer science, and had the further good luck of attending an institution that one knows and has confidence in.

But one should not be bigoted by familiarity: there are many terrific software engineers who attended little-known schools or who took otherwise unconventional paths. The completion of a formal education in computer science is much more important than the institution: the strongest candidate from a little-known school is almost assuredly stronger than the weakest candidate from a well-known school.

In other cases, it’s even more nuanced: there have been many later-in-life converts to the beauty and joy of software engineering, and such candidates should emphatically not be excluded merely because they discovered software later than others. For those that concentrated in entirely non-technical disciplines, further probing will likely be required, with greater emphasis on their technical artifacts.

The most important aspect of one’s formal education may not be its substance so much as its completion. Like software engineering, there are many aspects of completing a formal education that aren’t necessarily fun: classes that must be taken to meet requirements; professors that must be endured rather than enjoyed; subject matter that resists quick understanding or appeal. In this regard, completion of a formal education represents the completion of a significant task. Inversely, the failure to complete one’s formal education may constitute an area of concern. There are, of course, plausible life reasons to abandon one’s education prematurely (especially in an era when higher education is so expensive), but there are also many paths and opportunities to resume and complete it. The failure to complete formal education may indicate deeper problems, and should be understood.

Informal education

Learning is a life-long endeavor, and much of one’s education will be informal in nature. Assessing this informal education is less clear, especially because (by its nature) there is little formally to show for it — but candidates should have a track record of being able to learn on their own, even when this self-education is arduous. One way to probe this may be with a simple question: what is an example of something that you learned that was a struggle for you? As with other questions posed here, the question should have a written answer.

Motivation

Motivation is often not assessed in the interview process, which is unfortunate because it dictates so much of what we do and why. For many companies, it will be important to find those that are intrinsically motivated — those who do what they do primarily for the value of doing it.

Selecting for motivation can be a challenge, and defies formula. Here, open source and open development can be a tremendous asset: it allows others to see what is being done, and, if they are excited by the work, to join the effort and to make their motivation clear.

Values

Values are often not evaluated formally at all in the software engineering process, but they can be critical to determine the “fit” of a candidate. To differentiate values from principles: values represent relative importance versus the absolute importance of principles. Values are important in a software engineering context because we so frequently make tradeoffs in which our values dictate our disposition. (For example, the relative importance of speed of development versus rigor; both are clearly important and positive attributes, but there is often a tradeoff to be had between them). Different engineering organizations may have different values over different times or for different projects, but it’s also true that individuals tend to develop their own values over their career — and it’s essential that the values of a candidate do not clash with the values of the team that they are to join.

But how to assess one’s values? Many will speak to values that they don’t necessarily hold (e.g., rigor), so simply asking someone what’s important to them may or may not yield their true values. One observation is that one’s values — and the adherence or divergence from those values — will often be reflected in happiness and satisfaction with work. When work strongly reflects one’s values, one is much more likely to find it satisfying; when values are compromised (even if for a good reason), work is likely be unsatisfying. As such, the specifics of one’s values may be ascertained by asking candidates some probing questions, e.g.:

  • What work have you done that you are particularly proud of and why?
  • What mistakes have you made that you particularly regret and why?
  • When have you been happiest in your professional career and why?
  • When have you been unhappiest in your professional career and why?

Our values can also be seen in the way we interact with others. As such, here are some questions that may have revealing answers:

  • Who is someone who has mentored you, and what did you learn from them?
  • Who is someone you have mentored, and what did you learn from them?
  • What qualities do you most admire in other software engineers?

The answers to these questions should be written down to allow them to be answered thoughtfully and in advance — and then to serve as a starting point for conversation in an interview.

Some questions, however, are more amenable to a live interview. For example, it may be worth asking some situational questions like:

  • What are some times that you have felt values come into conflict? How did you resolve the conflict?

  • What are some times when you have diverged from your own values and how did you rectify it? For example, if you value robustness, how do you deal with having introduced a defect that should have been caught?

Integrity

In an ideal world, integrity would not be something we would need to assess in a candidate: we could trust that everyone is honest and trustworthy. This view, unfortunately, is naïve with respect to how malicious bad actors can be; for any organization — but especially for one that is biased towards trust and transparency — it is essential that candidates be of high integrity: an employee who operates outside of the bounds of integrity can do nearly unbounded damage to an organization that assumes positive intent.

There is no easy or single way to assess integrity for people with whom one hasn’t endured difficult times. By far the most accurate way of assessing integrity in a candidate is for them to already be in the circle of one’s trust: for them to have worked deeply with (and be trusted by) someone that is themselves deeply trusted. But even in these cases where the candidate is trusted, some basic verification is prudent.

Criminal background check

The most basic integrity check involves a criminal background check. While local law dictates how these checks are used, the check should be performed for a simple reason: it verifies that the candidate is who they say they are. If someone has made criminal mistakes, these mistakes may or may not disqualify them (much will depend on the details of the mistakes, and on local law on how background checks can be used), but if a candidate fails to be honest or remorseful about those mistakes, it is a clear indicator of untrustworthiness.

Credential check

A hidden criminal background in software engineering candidates is unusual; much more common is a slight “fudging” of credentials or other elements of one’s past: degrees that were not in fact earned; grades or scores that have been exaggerated; awards that were not in fact bestowed; gaps in employment history that are quietly covering up by changing the time that one was at a previous employer. These transgressions may seem slight, but they can point to something quite serious: a candidate’s willingness or desire to mislead others to advance themselves. To protect against this, a basic credential check should be performed. This can be confined to degrees, honors, and employment.

References

References can be very tricky, especially for someone coming from a difficult situation (e.g., fleeing poor management). Ideally, a candidate is well known by someone inside the company who is trusted — but even this poses challenges: sometimes we don’t truly know people until they are in difficult situations, and someone “known” may not, in fact, be known at all. Worse, references are most likely to break down when they are most needed: dishonest, manipulative people are, after all, dishonest and manipulative; they can easily fool people — and even references — into thinking that they are something that they are not. So while references can provide value (and shouldn’t be eliminated as a tool), they should also be used carefully and kept in perspective.

Interviews

For individuals outside of that circle of trust, checking integrity is probably still best done in person. There are several potential mechanisms here:

  • A very broad interview schedule that includes some people clearly subordinate to the candidate. Some people will treat people differently depending on the status that they perceive.

  • A very broad interview schedule that includes some people with a talent for reading others. For example, someone who is effective at sales often has a knack for picking up on subtle body langauge cues that others will miss.

  • Interviews that deliberately probe, e.g., asking candidates to describe a time that preserving integrity necessitated taking a more difficult path.

  • Interviews that setup role playing, e.g., asking candidates how they would handle a co-worker approaching them privately asking them to do something that they perceived as wrong.

Mechanics of evaluation

Interviews should begin with phone screens to assess the most basic viability, especially with respect to motivation. This initial conversation might include some basic but elementary (and unstructured) homework to gauge that motivation. The candidate should be pointed to material about the company and sources that describe methods of work and specifics about what that work entails. The candidate should be encouraged to review some of this material and send formal written thoughts as a quick test of motivation. If one is not motivated enough to learn about a potential employer, it’s hard to see how they will suddenly gain the motivation to see them through difficult problems.

If and when a candidate is interested in deeper interviews, everyone should be expected to provide the same written material.

Candidate-submitted material

The candidate should submit the following:

  • Code sample (no more than three)
  • Code project, if deemed applicable/appropriate
  • Writing sample (no more than one per category)
  • Analysis sample (no more than three)
  • Written answers to eight questions:
    1. What work have you done that you are particularly proud of and why?
    2. What mistakes have you made that you particularly regret and why?
    3. What is an example of something that you learned that was a struggle
      for you?
    4. When have you been happiest in your professional career and why?
    5. When have you been unhappiest in your professional career and why?
    6. Who is someone who has mentored you, and what did you learn from them?
    7. Who is someone you have mentored, and what did you learn from them?
    8. What qualities do you most admire in other software engineers?

Candidate-submitted material should be collected and distributed to everyone on the interview list.

Before the interview

Everyone on the interview schedule should read the candidate-submitted material, and a pre-meeting should then be held to discuss approach: based on the written material, what are the things that the team wishes to better understand? And who will do what?

Pre-interview job talk

For senior candidates, it can be effective to ask them to start the day by giving a technical presentation to those who will interview them. On the one hand, it may seem cruel to ask a candidate to present to a roomful of people who will be later interviewing them, but to the candidate this should be a relief: this allows them to start the day with a home game, where they are talking about something that they know well and can prepare for arbitrarily. The candidate should be allowed to present on anything technical that they’ve worked on, and it should be made clear that:

  1. Confidentiality will be respected (that is, they can present on proprietary work)

  2. The presentation needn’t be novel — it is fine for the candidate to give a talk that they have given before

  3. Slides are fine but not required

  4. The candidate should assume that the audience is technical, but not necessarily familiar with the domain that they are presenting

  5. The candidate should assume about 30 minutes for presentation and 15 minutes for questions.

The aim here is severalfold.

First, this lets everyone get the same information at once: it is not unreasonable that the talk that a candidate would give would be similar to a conversation that they would have otherwise had several times over the day as they are asked about their experience; this minimizes that repetition.

Second, it shows how well the candidate teaches. Assuming that the candidate is presenting on a domain that isn’t intimately known by every member of the audience, the candidate will be required to instruct. Teaching requires both technical mastery and empathy — and a pathological inability to teach may point to deeper problems in a candidate.

Third, it shows how well the candidate fields questions about their work. It should go without saying that the questions themselves shouldn’t be trying to find flaws with the work, but should be entirely in earnest; seeing how a candidate answers such questions can be very revealing about character.

All of that said: a job talk likely isn’t appropriate for every candidate — and shouldn’t be imposed on (for example) those still in school. One guideline may be: those with more than seven years of experience are expected to give a talk; those with fewer than three are not expected to give a talk (but may do so); those in between can use their own judgement.

Interviews

Interviews shouldn’t necessarily take one form; interviewers should feel free to take a variety of styles and approaches — but should generally refrain from “gotcha” questions and/or questions that may conflate surface aspects of intellect with deeper qualities (e.g., Microsoft’s infamous “why are manhole covers round?”). Mixing interview styles over the course of the day can also be helpful for the candidate.

After the interview

After the interview (usually the next day), the candidate should be discussed by those who interviewed them. The objective isn’t necessarily to get to consensus first (though that too, ultimately), but rather to areas of concern. In this regard, the post-interview conversation must be handled carefully: the interview is deliberately constructed to allow broad contact with the candidate, and it is possible than someone relatively junior or otherwise inexperienced will see something that others will miss. The meeting should be constructed to assure that this important data isn’t supressed; bad hires can happen when reservations aren’t shared out of fear of disappointing a larger group!

One way to do this is to structure the meeting this way:

  1. All participants are told to come in with one of three decisions: Hire, Do not hire, Insufficient information. All participants should have one of these positions and they should not change their initial position. (That is, one’s position on a candidate may change over the course of the meeting, but the initial position shouldn’t be retroactively changed.) If it helps, this position can be privately recorded before the meeting starts.

  2. The meeting starts with everyone who believes Do not hire explaining their position. While starting with the Do not hire positions may seem to give the meeting a negative disposition, it is extremely important that the meeting start with the reservations lest they be silenced — especially when and where they are so great that someone believes a candidate should not be hired.

  3. Next, those who believe Insufficient information should explain their position. These positions may be relatively common, and it means that the interview left the interviewer with unanswered questions. By presenting these unanswered questions, there is a possibility that others can provide answers that they may have learned in their interactions with the candidate.

  4. Finally, those who believe Hire should explain their position, perhaps filling in missing information for others who are less certain.

If there are any Do not hire positions, these should be treated very seriously, for it is saying that the aptitude, education, motivation, values and/or integrity of the candidate are in serious doubt or are otherwise unacceptable. Those who believe Do not hire should be asked for the dimensions that most substantiate their position. Especially where these reservations are around values or integrity, a single Do not hire should raise serious doubts about a candidate: the risks of bad hires around values or integrity are far too great to ignore someone’s judgement in this regard!

Ideally, however, no one has the position of Do not hire, and through a combination of screening and candidate self-selection, everyone believes Hire and the discussion can be brief, positive and forward-looking!

If, as is perhaps most likely, there is some mix of Hire and Insufficient information, the discussion should focus on the information that is missing about the candidate. If other interviewers cannot fill in the information about the candidate (and if it can’t be answered by the corpus of material provided by the candidate), the group should together brainstorm about how to ascertain it. Should a follow-up conversation be scheduled? Should the candidate be asked to provide some missing information? Should some aspect of the candidate’s background be explored? The collective decision should not move to Hire as long as there remain unanswered questions preventing everyone from reaching the same decision.

Assessing the assessment process

It is tautologically challenging to evaluate one’s process for assessing software engineers: one lacks data on the candidates that one doesn’t hire, and therefore can’t know which candidates should have been extended offers of employment but weren’t. As such, hiring processes can induce a kind of ultimate survivorship bias in that it is only those who have survived (or instituted) the process who are present to assess it — which can lead to deafening echo chambers of smug certitude. One potential way to assess the assessment process: ask candidates for their perspective on it. Candidates are in a position to be evaluating many different hiring processes concurrently, and likely have the best perspective on the relative merits of different ways of assessing software engineers.

Of course, there is peril here too: while many organizations would likely be very interested in a candidate who is bold enough to offer constructive criticism on the process being used to assess them while it is being used to assess them, the candidates themselves might not realize that — and may instead offer bland bromides for fear of offending a potential employer. Still, it has been our experience that a thoughtful process will encourage a candidate’s candor — and we have found that the processes described here have been strengthened by listening carefully to the feedback of candidates.

Categories: Personal Blogs

Should KubeCon be double-blind?

Wed, 10/03/2018 - 20:41

With a paltry 13% acceptance rate, KubeCon is naturally going to generate a lot of disappointment — the vast, vast majority of proposals aren’t being accepted. But as several have noted, a small number of vendors account for a significant number of accepted talks. Is this an issue? In particular, review for KubeCon isn’t double-blind; should it be?

In terms of my own perspective here, I view conferences for practitioners (and especially their concomitant hallway tracks) as essential for the community of our craft. Historically, I have been troubled by the strangulation of practioner conferences by academic computer science: after we presented DTrace at USENIX 2004, I publicly wondered about the fate of USENIX — which engendered some thoughtful discussion. When USENIX had me keynote their annual technical conference twelve years later, I used the opportunity to express my concerns with the conference model, and wondered about finding the right solution both for practitioners and for academic computer science. That evening, we had a birds-of-a-feather session, which (encouragingly) was very well attended. There were many interesting perspectives, but the one that stood out to me was from Kathryn McKinley, who makes a compelling case that reviews should be double-blind. In the BOF, McKinley was emphatic and persuasive that conferences absolutely must be double-blind in their review — and that anything less is a disservice to the community and the discipline.

Wanting to take that advice, when we organized Systems We Love later that year, we ran it double-blind with a very large (and, if I may say, absolutely awesome!) program committee. We had many, many submissions — well over ten times the number of slots! We were double-blind for the first few stages of review, until the number of submissions had been reduced by a factor of five. Once we had reduced the number of talks submissions to “merely” double the number of slots, we de-blinded to get the rest of the way to a program. (Which was agonizing — too many great submissions!) By de-blinding, we were essentially using factors about the submitter as a tie-breaker to differentiate submissions that were both high quality — and as a way to get voices we might not otherwise hear from.

Personally, I feel that we were able to hit a sweet spot by doing in this way — and there were quite a few surprises when we de-blinded. Of note, at least a quarter of the speakers (and perhaps more, as I didn’t ask everyone) were presenting for the first time. Equally as surprising: several “big names” had submissions that we rejected while blinded — but looking at their submissions, the submissions themselves just weren’t that great! (Which isn’t to say that they don’t have a ton of terrific work to their name — just that every swing of the bat is not going to be a home run.)

So: should KubeCon be double-blind? I consider myself firmly in McKinley’s camp in that I believe that any oversubscribed conference needs to be double-blind to a very significant degree. That said, I also think our challenges as practitioners don’t exactly map to the challenges in academic computer science. (For example, because we aren’t using conferences as a publishing vector, I don’t think we need to be double-blind-until-accept — I think we can de-blind ourself to our rejections.) I also don’t even think we need to be double-blind all the way through the process: we should be double-blind until the program committee has reduced the number of submissions to the point that every remaining submission is deemed one that the program committee wants to accept. (That is, to the point that were it not for the physical limits of the conference, the program committee would want to accept the remaining submissions.) De-blinding at this point assures that the quality of the content is primarily due to the merit of the submission — not due to the particulars of the submitter. (That is, not based on what they’ve done in the past — or who their employer happens to be.) That said, de-blinding at the point of quality does allow these other factors to be used to mold the final program.

For KubeCon — and for other practitioner conferences — I think a hybrid model is the best approach: double-blind for a significant fraction of review, de-blinded for a final program formulation, and then perhaps “invited talks” for talks that were rejected when blind, but that the program committee wishes to accept based on the presenter. This won’t lead to less disappointment at KubeCon (13% is too low an acceptance rate to not be rejecting high-quality submissions), but I believe that a significantly double-blind process will give the community the assurance of a program that best represents it!

Categories: Personal Blogs

The relative performance of C and Rust

Sat, 09/29/2018 - 03:28

My blog post on falling in love with Rust got quite a bit of attention — with many being surprised by what had surprised me as well: the high performance of my naive Rust versus my (putatively less naive?) C. However, others viewed it as irresponsible to report these performance differences, believing that these results would be blown out of proportion or worse. The concern is not entirely misplaced: system benchmarking is one of those areas where — in Jonathan Swift’s words from three centuries ago — “falsehood flies, and the truth comes limping after it.”

There are myriad reasons why benchmarking is so vulnerable to leaving truth behind. First, it’s deceptively hard to quantify the performance of a system simply because the results are so difficult to verify: the numbers we get must be validated (or rejected) according to the degree that they comport with our expectations. As a result, if our expectations are incorrect, the results can be wildly wrong. To see this vividly, please watch (or rewatch!) Brendan Gregg’s excellent (and hilarious) lightning talk on benchmarking gone wrong. Brendan recounts his experience dealing with a particularly flawed approach, and it’s a talk that I always show anyone who is endeavoring to benchmark the system: it shows how easy it is to get it totally wrong — and how important it is to rigorously validate results.

Second, even if one gets an entirely correct result, it’s really only correct within the context of the system. As we succumb to the temptation of applying a result more universally than this context merits — as the asterisks and the qualifiers on a performance number are quietly amputated — a staid truth is transmogrified into a flying falsehood. Worse, some of that context may have been implicit in that the wrong thing may have been benchmarked: in trying to benchmark one aspect of the system, one may inadvertently merely quantify an otherwise hidden bottleneck.

So take all of this as disclaimer: I am not trying to draw large conclusions about “C vs. Rust” here. To the contrary, I think that it is a reasonable assumption that, for any task, a lower-level language can always be made to outperform a higher-level one. But with that said, a pesky fact remains: I reimplemented a body of C software in Rust, and it performed better for the same task; what’s going on? And is there anything broader we can say about these results?

To explore this, I ran some statemap rendering tests on SmartOS on a single-socket Haswell server (Xeon E3-1270 v3) running at 3.50GHz. The C version was compiled with GCC 7.3.0 with -O2 level optimizations; the Rust version was compiled with 1.29.0 with --release. All of the tests were run bound to a processor set containing a single core; all were bound to one logical CPU within that core, with the other logical CPU forced to be idle. cpustat was used to gather CPU performance counter data, with one number denoting one run with pic0 programmed to that CPU performance counter. The input file (~30MB compressed) contains 3.5M state changes, and in the default config will generate a ~6MB SVG.

Here are the results for a subset of the counters relating to the cache performance:
Counter statemap-gcc statemap-rust cpu_clk_unhalted.thread_p 32,166,437,125 23,127,271,226 -28.1% inst_retired.any_p 49,110,875,829 48,752,136,699 -0.7% cpu_clk_unhalted.ref_p 918,870,673 660,493,684 -28.1% mem_uops_retired.stlb_miss_loads 8,651,386 2,353,178 -72.8% mem_uops_retired.stlb_miss_stores 268,802 1,000,684 272.3% mem_uops_retired.lock_loads 7,791,528 51,737 -99.3% mem_uops_retired.split_loads 107,969 52,745,125 48752.1% mem_uops_retired.split_stores 196,934 41,814,301 21132.6% mem_uops_retired.all_loads 11,977,544,999 9,035,048,050 -24.6% mem_uops_retired.all_stores 3,911,589,945 6,627,038,769 69.4% mem_load_uops_retired.l1_hit 9,337,365,435 8,756,546,174 -6.2% mem_load_uops_retired.l2_hit 1,205,703,362 70,967,580 -94.1% mem_load_uops_retired.l3_hit 66,771,301 33,323,740 -50.1% mem_load_uops_retired.l1_miss 1,276,311,911 105,524,579 -91.7% mem_load_uops_retired.l2_miss 69,671,774 34,616,966 -50.3% mem_load_uops_retired.l3_miss 2,544,750 1,364,435 -46.4% mem_load_uops_retired.hit_lfb 1,393,631,815 157,897,686 -88.7% mem_load_uops_l3_hit_retired.xsnp_miss 435 526 20.9% mem_load_uops_l3_hit_retired.xsnp_hit 1,269 740 -41.7% mem_load_uops_l3_hit_retired.xsnp_hitm 820 517 -37.0% mem_load_uops_l3_hit_retired.xsnp_none 67,846,758 33,376,449 -50.8% mem_load_uops_l3_miss_retired.local_dram 2,543,699 1,301,381 -48.8%

 

So the Rust version is issuing a remarkably similar number of instructions (within less than one percent!), but with a decidedly different mix: just three quarters of the loads of the C version and (interestingly) many more stores. The cycles per instruction (CPI) drops from 0.65 to 0.47, indicating much better memory behavior — and indeed the L1 misses, L2 misses and L3 misses are all way down. The L1 hits as an absolute number are actually quite high relative to the loads, giving Rust a 96.9% L1 hit rate versus the C version’s 77.9% hit rate. Rust also lives much better in the L2, where it has half the L2 misses of the C version.

Okay, so Rust has better memory behavior than C? Well, not so fast. In terms of what this thing is actually doing: the core of statemap processing is coalescing a large number of state transitions in the raw data into a smaller number of rectangles for the resulting SVG. When presented with a new state transition, it picks the “best” two adjacent rectangles to coalesce based on a variety of properties. As a result, this code spends all of its time constantly updating an efficient data structure to be able to make this decision. For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree. B-trees are common when dealing with on-disk state, where the cost of loading a node contained in a disk block is much, much less than the cost of searching that node for a desired datum, but they are less common as a replacement for an in-memory BST. Rust makes the (compelling) argument that, given the modern memory hierarchy, the cost of getting a line from memory is far greater than the cost of reading it out of a cache — and B-trees make sense as a replacement for BSTs, albeit with a much smaller value for B. (Cache lines are somewhere between 64 and 512 bytes; disk blocks start at 512 bytes and can be much larger.)

Could the performance difference that we’re seeing simply be Rust’s data structure being — per its design goals — more cache efficient? To explore this a little, I varied the value of the number of rectangles in the statemap, as this will affect both the size of the tree (more rectangles will be a larger tree, leading to a bigger working set) and the number of deletions (more rectangles will result in fewer deletions, leading to less compute time).

The results were pretty interesting:

A couple of things to note here: first, there are 3.5M state transitions in the input data; as soon as the number of rectangles exceeds the number of states, there is no reason for any coalescing, and some operations (namely, deleting from the tree of rectangles) go away entirely. So that explains the flatline at roughly 3.5M rectangles.

Also not surprisingly, the worst performance for both approaches occurs when the number of rectangles is set at more or less half the number of state transitions: the tree is huge (and therefore has relatively poorer cache performance for either approach) and each new state requires a deletion (so the computational cost is also high).

So far, this seems consistent with the BTreeSet simply being a more efficient data structure. But what is up with that lumpy Rust performance?! In particular there are some strange spikes; e.g., zooming in on the rectangle range up to 100,000 rectangles:

Just from eyeballing it, they seem to appear at roughly logarithmic frequency with respect to the number of rectangles. My first thought was perhaps some strange interference relationship with respect to the B-tree and the cache size or stride, but this is definitely a domain where an ounce of data is worth much more than a pound of hypotheses!

Fortunately, because Rust is static (and we have things like, say, symbols and stack traces!), we can actually just use DTrace to explore this. Take this simple D script, rustprof.d: #pragma D option quiet profile-4987hz /pid == $target && arg1 != 0/ { @[usym(arg1)] = count(); } END { trunc(@, 10); printa("%10@d %A\n", @); }

I ran this against two runs: one at a peak (e.g., 770,000 rectangle) and then another at the adjacent trough (e.g., 840,000 rectangles), demangling the resulting names by sending the the output through rustfilt. Results for 770,000 rectangles: # dtrace -s ./rustprof.d -c "./statemap --dry-run -c 770000 ./pg-zfs.out" | rustfilt 3943472 records processed, 769999 rectangles 1043 statemap`<alloc::collections::btree::map::BTreeMap<K, V>>::remove 1180 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::finish 1208 libc.so.1`memmove 1253 statemap`<serde_json::read::StrRead<'a> as serde_json::read::Read<'a>>::parse_str 1320 statemap`<std::collections::hash::map::HashMap<K, V, S>>::remove 1695 libc.so.1`memcpy 2558 statemap`statemap::statemap::Statemap::ingest 4123 statemap`<std::collections::hash::map::HashMap<K, V, S>>::insert 4503 statemap`<std::collections::hash::map::HashMap<K, V, S>>::get 26640 statemap`alloc::collections::btree::search::search_tree

And now the same thing, but against the adjacent valley of better performance at 840,000 rectangles: # dtrace -s ./rustprof.d -c "./statemap --dry-run -c 840000 ./pg-zfs.out" | rustfilt 3943472 records processed, 839999 rectangles 971 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write 1071 statemap`<alloc::collections::btree::map::BTreeMap<K, V>>::remove 1158 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::finish 1228 libc.so.1`memmove 1348 statemap`<serde_json::read::StrRead<'a> as serde_json::read::Read<'a>>::parse_str 1628 libc.so.1`memcpy 2524 statemap`statemap::statemap::Statemap::ingest 2948 statemap`<std::collections::hash::map::HashMap<K, V, S>>::insert 4125 statemap`<std::collections::hash::map::HashMap<K, V, S>>::get 26359 statemap`alloc::collections::btree::search::search_tree

The samples in btree::search::search_tree are roughly the same — but the poorly performing one has many more samples in HashMap<K, V, S>::insert (4123 vs. 2948). What is going on? The HashMap implementation in Rust uses Robin Hood hashing and linear probing — which means that hash maps must be resized when they hit a certain load factor. (By default, the hash map load factor is 90.9%.) And note that I am using hash maps to effectively implement a doubly linked list: I will have a number of hash maps that — between them — will contain the specified number of rectangles. Given that we only see this at particular sizes (and given that the distance between peaks increases exponentially with respect to the number of rectangles), it seems entirely plausible that at some numbers of rectangles, the hash maps will grow large enough to induce quite a bit more probing, but not quite large enough to be resized.

To explore this hypothesis, it would be great to vary the hash map load factor, but unfortunately the load factor isn’t currently dynamic. Even then, we could explore this by using with_capacity to preallocate our hash maps, but the statemap code doesn’t necessarily know how much to preallocate because the rectangles themselves are spread across many hash maps.

Another option is to replace our use of HashMap with a different data structure — and in particular, we can use a BTreeMap in its place. If the load factor isn’t the issue (that is, if there is something else going on for which the additional compute time in HashMap<K, V, S>::insert is merely symptomatic), we would expect a BTreeMap-based implementation to have a similar issue at the same points.

With Rust, conducting this experiment is absurdly easy: diff --git a/src/statemap.rs b/src/statemap.rs index a44dc73..5b7073d 100644 --- a/src/statemap.rs +++ b/src/statemap.rs @@ -109,7 +109,7 @@ struct StatemapEntity { last: Option, // last start time start: Option, // current start time state: Option, // current state - rects: HashMap<u64, RefCell>, // rectangles for this entity + rects: BTreeMap<u64, RefCell>, // rectangles for this entity } #[derive(Debug)] @@ -151,6 +151,7 @@ use std::str; use std::error::Error; use std::fmt; use std::collections::HashMap; +use std::collections::BTreeMap; use std::collections::BTreeSet; use std::str::FromStr; use std::cell::RefCell; @@ -306,7 +307,7 @@ impl StatemapEntity { description: None, last: None, state: None, - rects: HashMap::new(), + rects: BTreeMap::new(), id: id, } }

That’s it: because the two (by convention) have the same interface, there is nothing else that needs to be done! And the results, with the new implementation in light blue:

Our lumps are gone! In general, the BTreeMap-based implementation performs a little worse than the HashMap-based implementation, but without as much variance. Which isn’t to say that this is devoid of strange artifacts! It’s especially interesting to look at the variation at lower levels of rectangles, when the two implementations seem to alternate in the pole position:

I don’t know what happens to the BTreeMap-based implementation at about ~2,350 rectangles (where it degrades by nearly 10% but then recovers when the number of rectangles hits ~2,700 or so), but at this point, the effects are only academic for my purposes: for statemaps, the default number of rectangles is 25,000. That said, I’m sure that digging there would yield interesting discoveries!

So, where does all of this leave us? Certainly, Rust’s foundational data structures perform very well. Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.

But that would be overlooking something important: part of the reason that using a BST (and in particular, an AVL tree) was easy for me is because we have an AVL tree implementation built as an intrusive data structure. This is a pattern we use a bunch in C: the data structure is embedded in a larger, containing structure — and it is the caller’s responsibility to allocate, free and lock this structure. That is, implementing a library as an intrusive data structure completely sidesteps both allocation and locking. This allows for an entirely robust arbitrarily embeddable library, and it also makes it really easy for a single data structure to be in many different data structures simultaneously. For example, take ZFS’s zio structure, in which a single contiguous chunk of memory is on (at least) two different lists and three different AVL trees! (And if that leaves you wondering how anything could possibly be so complicated, see George Wilson’s recent talk explaining the ZIO pipeline.)

Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way. (After all, a B-tree is a win — but it’s a constant-time win.)

Contrast this to Rust: intrusive data structures are possible in Rust, but they are essentially an anti-pattern. Rust really, really wants you to have complete orthogonality of purpose in your software. This leads you to having multiple disjoint data structures with very clear trees of ownership — where before you might have had a single more complicated data structure with graphs of multiple ownership. This clear separation of concerns in turn allows for these implementations to be both broadly used and carefully optimized. For an in-depth example of the artful implementation that Rust allows, see Alexis Beingessner’s excellent blog entry on the BTreeMap implementation.

All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance. Does this mean that Rust will always outperform C? No, of course not. But it does mean that you shouldn’t be surprised when it does — and that if you care about performance and you are implementing new software, it is probably past time to give Rust a very serious look!

Categories: Personal Blogs

Falling in love with Rust

Wed, 09/19/2018 - 00:31

Let me preface this with an apology: this is a technology love story, and as such, it’s long, rambling, sentimental and personal. Also befitting a love story, it has a When Harry Met Sally feel to it, in that its origins are inauspicious…
First encounters

Over a decade ago, I worked on a technology to which a competitor paid the highest possible compliment: they tried to implement their own knockoff. Because this was done in the open (and because it is uniquely mesmerizing to watch one’s own work mimicked), I spent way too much time following their mailing list and tracking their progress (and yes, taking an especially shameful delight in their occasional feuds). On their team, there was one technologist who was clearly exceptionally capable — and I confess to being relieved when he chose to leave the team relatively early in the project’s life. This was all in 2005; for years for me, Rust was “that thing that Graydon disappeared to go work on.” From the description as I read it at the time, Graydon’s new project seemed outrageously ambitious — and I assumed that little would ever come of it, though certainly not for lack of ability or effort…

Fast forward eight years to 2013 or so. Impressively, Graydon’s Rust was not only still alive, but it had gathered a community and was getting quite a bit of attention — enough to merit a serious look. There seemed to be some very intriguing ideas, but any budding interest that I might have had frankly withered when I learned that Rust had adopted the M:N threading model — including its more baroque consequences like segmented stacks. In my experience, every system that has adopted the M:N model has lived to regret it — and it was unfortunate to have a promising new system appear to be ignorant of the scarred shoulders that it could otherwise stand upon. For me, the implications were larger than this single decision: I was concerned that it may be indicative of a deeper malaise that would make Rust a poor fit for the infrastructure software that I like to write. So while impressed that Rust’s ambitious vision was coming to any sort of fruition at all, I decided that Rust wasn’t for me personally — and I didn’t think much more about it…

Some time later, a truly amazing thing happened: Rust ripped it out. Rust’s reasoning for removing segmented stacks is a concise but thorough damnation; their rationale for removing M:N is clear-eyed, thoughtful and reflective — but also unequivocal in its resolve. Suddenly, Rust became very interesting: all systems make mistakes, but few muster the courage to rectify them; on that basis alone, Rust became a project worthy of close attention.

So several years later, in 2015, it was with great interest that I learned that Adam started experimenting with Rust. On first read of Adam’s blog entry, I assumed he would end what appeared to be excruciating pain by deleting the Rust compiler from his computer (if not by moving to a commune in Vermont) — but Adam surprised me when he ended up being very positive about Rust, despite his rough experiences. In particular, Adam hailed the important new ideas like the ownership model — and explicitly hoped that his experience would serve as a warning to others to approach the language in a different way.

In the years since, Rust continued to mature and my curiosity (and I daresay, that of many software engineers) has steadily intensified: the more I have discovered, the more intrigued I have become. This interest has coincided with my personal quest to find a programming language for the back half of my career: as I mentioned in my Node Summit 2017 talk on platform as a reflection of values, I have been searching for a language that reflects my personal engineering values around robustness and performance. These values reflect a deeper sense within me: that software can be permanent — that software’s unique duality as both information and machine afford a timeless perfection and utility that stand apart from other human endeavor. In this regard, I have believed (and continue to believe) that we are living in a Golden Age of software, one that will produce artifacts that will endure for generations. Of course, it can be hard to hold such heady thoughts when we seem to be up to our armpits in vendored flotsam, flooded by sloppy abstractions hastily implemented. Among current languages, only Rust seems to share this aspiration for permanence, with a perspective that is decidedly larger than itself.
Taking the plunge

So I have been actively looking for an opportunity to dive into Rust in earnest, and earlier this year, one presented itself: for a while, I have been working on a new mechanism for system visualization that I dubbed statemaps. The software for rendering statemaps needs to inhale a data stream, coalesce it down to a reasonable size, and render it as a dynamic image that can be manipulated by the user. This originally started off as being written in node.js, but performance became a problem (especially for larger data sets) and I did what we at Joyent have done in such situations: I rewrote the hot loop in C, and then dropped that into a node.js add-on (allowing the SVG-rendering code to remain in JavaScript). This was fine, but painful: the C was straightforward, but the glue code to bridge into node.js was every bit as capricious, tedious, and error-prone as it has always been. Given the performance constraint, the desire for the power of a higher level language, and the experimental nature of the software, statemaps made for an excellent candidate to reimplement in Rust; my intensifying curiosity could finally be sated!

As I set out, I had the advantage of having watched (if from afar) many others have their first encounters with Rust. And if those years of being a Rust looky-loo taught me anything, it’s that the early days can be like the first days of snowboarding or windsurfing: lots of painful falling down! So I took deliberate approach with Rust: rather than do what one is wont to do when learning a new language and tinker a program into existence, I really sat down to learn Rust. This is frankly my bias anyway (I always look for the first principles of a creation, as explained by its creators), but with Rust, I went further: not only did I buy the canonical reference (The Rust Programming Language by Steve Klabnik, Carol Nichols and community contributors), I also bought an O’Reilly book with a bit more narrative (Programming Rust by Jim Blandy and Jason Orendorff). And with this latter book, I did something that I haven’t done since cribbing BASIC programs from Enter magazine back in the day: I typed in the example program in the introductory chapters. I found this to be very valuable: it got the fingers and the brain warmed up while still absorbing Rust’s new ideas — and debugging my inevitable transcription errors allowed me to get some understanding of what it was that I was typing. At the end was something that actually did something, and (importantly), by working with a program that was already correct, I was able to painlessly feel some of the tremendous promise of Rust.

Encouraged by these early (if gentle) experiences, I dove into my statemap rewrite. It took a little while (and yes, I had some altercations with the borrow checker!), but I’m almost shocked about how happy I am with the rewrite of statemaps in Rust. Because I know that many are in the shoes I occupied just a short while ago (namely, intensely wondering about Rust, but also wary of its learning curve — and concerned about the investment of time and energy that climbing it will necessitate), I would like to expand on some of the things that I love about Rust other than the ownership model. This isn’t because I don’t love the ownership model (I absolutely do) or that the ownership model isn’t core to Rust (it is rightfully thought of as Rust’s epicenter), but because I think its sheer magnitude sometimes dwarfs other attributes of Rust — attributes that I find very compelling! In a way, I am writing this for my past self — because if I have one regret about Rust, it’s that I didn’t see beyond the ownership model to learn it earlier.

I will discuss these attributes in roughly the order I discovered them with the (obvious?) caveat that this shouldn’t be considered authoritative; I’m still very much new to Rust, and my apologies in advance for any technical details that I get wrong!
1. Rust’s error handling is beautiful

The first thing that really struck me about Rust was its beautiful error handling — but to appreciate why it so resonated with me requires some additional context. Despite its obvious importance, error handling is something we haven’t really gotten right in systems software. For example, as Dave Pacheo observed with respect to node.js, we often conflate different kinds of errors — namely, programmatic errors (i.e., my program is broken because of a logic error) with operational errors (i.e., an error condition external to my program has occurred and it affects my operation). In C, this conflation is unusual, but you see it with the infamous SIGSEGV signal handler that has been known to sneak into more than one undergraduate project moments before a deadline to deal with an otherwise undebuggable condition. In the Java world, this is slightly more common with the (frowned upon) behavior of catching java.lang.NullPointerException or otherwise trying to drive on in light of clearly broken logic. And in the JavaScript world, this conflation is commonplace — and underlies one of the most serious objections to promises.

Beyond the ontological confusion, error handling suffers from an infamous mechanical problem: for a function that may return a value but may also fail, how is the caller to delineate the two conditions? (This is known as the semipredicate problem after a Lisp construct that suffers from it.) C handles this as it handles so many things: by leaving it to the programmer to figure out their own (bad) convention. Some use sentinel values (e.g., Linux system calls cleave the return space in two and use negative values to denote the error condition); some return defined values on success and failure and then set an orthogonal error code; and of course, some just silently eat errors entirely (or even worse).

C++ and Java (and many other languages before them) tried to solve this with the notion of exceptions. I do not like exceptions: for reasons not dissimilar to Dijkstra’s in his famous admonition against “goto”, I consider exceptions harmful. While they are perhaps convenient from a function signature perspective, exceptions allow errors to wait in ambush, deep in the tall grass of implicit dependencies. When the error strikes, higher-level software may well not know what hit it, let alone from whom — and suddenly an operational error has become a programmatic one. (Java tries to mitigate this sneak attack with checked exceptions, but while well-intentioned, they have serious flaws in practice.) In this regard, exceptions are a concrete example of trading the speed of developing software with its long-term operability. One of our deepest, most fundamental problems as a craft is that we have enshrined “velocity” above all else, willfully blinding ourselves to the long-term consequences of gimcrack software. Exceptions optimize for the developer by allowing them to pretend that errors are someone else’s problem — or perhaps that they just won’t happen at all.

Fortunately, exceptions aren’t the only way to solve this, and other languages take other approaches. Closure-heavy languages like JavaScript afford environments like node.js the luxury of passing an error as an argument — but this argument can be ignored or otherwise abused (and it’s untyped regardless), making this solution far from perfect. And Go uses its support for multiple return values to (by convention) return both a result and an error value. While this approach is certainly an improvement over C, it is also noisy, repetitive and error-prone.

By contrast, Rust takes an approach that is unique among systems-oriented languages: leveraging first algebraic data types — whereby a thing can be exactly one of an enumerated list of types and the programmer is required to be explicit about its type to manipulate it — and then combining it with its support for parameterized types. Together, this allows functions to return one thing that’s one of two types: one type that denotes success and one that denotes failure. The caller can then pattern match on the type of what has been returned: if it’s of the success type, it can get at the underlying thing (by unwrapping it), and if it’s of the error type, it can get at the underlying error and either handle it, propagate it, or improve upon it (by adding additional context) and propagating it. What it cannot do (or at least, cannot do implicitly) is simply ignore it: it has to deal with it explicitly, one way or the other. (For all of the details, see Recoverable Errors with Result.)

To make this concrete, in Rust you end up with code that looks like this: fn do_it(filename: &str) -> Result { let stat = match fs::metadata(filename) { Ok(result) => { result }, Err(err) => { return Err(err); } }; let file = match File::open(filename) { Ok(result) => { result }, Err(err) => { return Err(err); } }; /* ... */ Ok(()) }

Already, this is pretty good: it’s cleaner and more robust than multiple return values, return sentinels and exceptions — in part because the type system helps you get this correct. But it’s also verbose, so Rust takes it one step further by introducing the propagation operator: if your function returns a Result, when you call a function that itself returns a Result, you can append a question mark on the call to the function denoting that upon Ok, the result should be unwrapped and the expression becomes the unwrapped thing — and upon Err the error should be returned (and therefore propagated). This is easier seen than explained! Using the propagation operator turns our above example into this: fn do_it_better(filename: &str) -> Result { let stat = fs::metadata(filename)?; let file = File::open(filename)?; /* ... */ Ok(()) }

This, to me, is beautiful: it is robust; it is readable; it is not magic. And it is safe in that the compiler helps us arrive at this and then prevents us from straying from it.

Platforms reflect their values, and I daresay the propagation operator is an embodiment of Rust’s: balancing elegance and expressiveness with robustness and performance. This balance is reflected in a mantra that one hears frequently in the Rust community: “we can have nice things.” Which is to say: while historically some of these values were in tension (i.e., making software more expressive might implicitly be making it less robust or more poorly performing), through innovation Rust is finding solutions that don’t compromise one of these values for the sake of the other.
2. The macros are incredible

When I was first learning C, I was (rightly) warned against using the C preprocessor. But like many of the things that we are cautioned about in our youth, this warning was one that the wise give to the enthusiastic to prevent injury; the truth is far more subtle. And indeed, as I came of age as a C programmer, I not only came to use the preprocessor, but to rely upon it. Yes, it needed to be used carefully — but in the right hands it could generate cleaner, better code. (Indeed, the preprocessor is very core to the way we implement DTrace’s statically defined tracing.) So if anything, my problems with the preprocessor were not its dangers so much as its many limitations: because it is, in fact, a preprocessor and not built into the language, there were all sorts of things that it would never be able to do — like access the abstract syntax tree.

With Rust, I have been delighted by its support for hygienic macros. This not only solves the many safety problems with preprocessor-based macros, it allows them to be outrageously powerful: with access to the AST, macros are afforded an almost limitless expansion of the syntax — but invoked with an indicator (a trailing bang) that makes it clear to the programmer when they are using a macro. For example, one of the fully worked examples in Programming Rust is a json! macro that allows for JSON to be easy declared in Rust. This gets to the ergonomics of Rust, and there are many macros (e.g., format!, vec!, etc.) that make Rust more pleasant to use.

Another advantage of macros: they are so flexible and powerful that they allow for effective experimentation. For example, the propagation operator that I love so much actually started life as a try! macro; that this macro was being used ubiquitously (and successfully) allowed a language-based solution to be considered. Languages can be (and have been!) ruined by too much experimentation happening in the language rather than in how it’s used; through its rich macros, it seems that Rust can enable the core of the language to remain smaller — and to make sure that when it expands, it is for the right reasons and in the right way.
3. format! is a pleasure

Okay, this is a small one but it’s (another) one of those little pleasantries that has made Rust really enjoyable. Many (most? all?) languages have an approximation or equivalent of the venerable sprintf, whereby variable input is formatted according to a format string. Rust’s variant of this is the format! macro (which is in turn invoked by println!, panic!, etc.), and (in keeping with one of the broader themes of Rust) it feels like it has learned from much that came before it. It is type-safe (of course) but it is also clean in that the {} format specifier can be used on any type that implements the Display trait. I also love that the {:?} format specifier denotes that the argument’s Debug trait implementation should be invoked to print debug output. More generally, all of the format specifiers map to particular traits, allowing for an elegant approach to an historically grotty problem. There are a bunch of other niceties, and it’s all a concrete example of how Rust uses macros to deliver nice things without sullying syntax or otherwise special-casing. None of the formatting capabilities are unique to Rust, but that’s the point: in this (small) domain (as in many) Rust feels like a distillation of the best work that came before it. As anyone who has had to endure one of my talks can attest, I believe that appreciating history is essential both to understand our present and to map our future. Rust seems to have that perspective in the best ways: it is reverential of the past without being incarcerated by it.
4. include_str! is a godsend

One of the filthy aspects of the statemap code is that it is effectively encapsulating another program — a JavaScript program that lives in the SVG to allow for the interactivity of the statemap. This code lives in its own file, which the statemap code should pass through to the generated SVG. In the node.js/C hybrid, I am forced to locate the file in the filesystem — which is annoying because it has to be delivered along with the binary and located, etc. Now Rust — like many languages (including ES6) — has support for raw-string literals. As an aside, it’s interesting to see the discussion leading up to its addition, and in particular, how a group of people really looked at every language that does this to see what should be mimicked versus what could be improved upon. I really like the syntax that Rust converged on: r followed by one or more octothorpes followed by a quote to begin a raw string literal, and a quote followed by a matching number of octothorpes followed to end a literal, e.g.:

let str = r##""What a curious feeling!" said Alice"##;

This alone would have allowed me to do what I want, but still a tad gross in that it’s a bunch of JavaScript living inside a raw literal in a .rs file. Enter include_str!, which allows me to tell the compiler to find the specified file in the filesystem during compilation, and statically drop it into a string variable that I can manipulate: ... /* * Now drop in our in-SVG code. */ let lib = include_str!("statemap-svg.js"); ...

So nice! Over the years I have wanted this many times over for my C, and it’s another one of those little (but significant!) things that make Rust so refreshing.
5. Serde is stunningly good

Serde is a Rust crate that allows for serialization and deserialization, and it’s just exceptionally good. It uses macros (and, in particular, Rust’s procedural macros) to generate structure-specific routines for serialization and deserialization. As a result, Serde requires remarkably little programmer lift to use and performs eye-wateringly well — a concrete embodiment of Rust’s repeated defiance of the conventional wisdom that programmers must choose between abstractions and performance!

For example, in the statemap implementation, the input is concatenated JSON that begins with a metadata payload. To read this payload in Rust, I define the structure, and denote that I wish to derive the Deserialize trait as implemented by Serde: #[derive(Deserialize, Debug)] #[allow(non_snake_case)] struct StatemapInputMetadata { start: Vec<u64>, title: String, host: Option<String>, entityKind: Option<String>, states: HashMap<String, StatemapInputState>, }

Then, to actually parse it: let metadata: StatemapInputMetadata = serde_json::from_str(payload)?;

That’s… it. Thanks to the magic of the propagation operator, the errors are properly handled and propagated — and it has handled tedious, error-prone things for me like the optionality of certain members (itself beautifully expressed via Rust’s ubiquitous Option type). With this one line of code, I now (robustly) have a StatemapInputMetadata instance that I can use and operate upon — and this performs incredibly well on top of it all. In this regard, Serde represents the best of software: it is a sophisticated, intricate implementation making available elegant, robust, high-performing abstractions; as legendary White Sox play-by-play announcer Hawk Harrelson might say, MERCY!
6. I love tuples

In my C, I have been known to declare anonymous structures in functions. More generally, in any strongly typed language, there are plenty of times when you don’t want to have to fill out paperwork to be able to structure your data: you just want a tad more structure for a small job. For this, Rust borrows an age-old construct from ML in tuples. Tuples are expressed as a parenthetical list, and they basically work as you expect them to work in that they are static in size and type, and you can index into any member. For example, in some test code that needs to make sure that names for colors are correctly interpreted, I have this: let colors = vec![ ("aliceblue", (240, 248, 255)), ("antiquewhite", (250, 235, 215)), ("aqua", (0, 255, 255)), ("aquamarine", (127, 255, 212)), ("azure", (240, 255, 255)), /* ... */ ];

Then colors[2].0 (say) which will be the string “aqua”; (colors[1].1).2 will be the integer 215. Don’t let the absence of a type declaration in the above deceive you: tuples are strongly typed, it’s just that Rust is inferring the type for me. So if I accidentally try to (say) add an element to the above vector that contains a tuple of mismatched signature (e.g., the tuple “((188, 143, 143), ("rosybrown"))“, which has the order reversed), Rust will give me a compile-time error.

The full integration of tuples makes them a joy to use. For example, if a function returns a tuple, you can easily assign its constituent parts to disjoint variables, e.g.: fn get_coord() -> (u32, u32) { (1, 2) } fn do_some_work() { let (x, y) = get_coord(); /* x has the value 1, y has the value 2 */ }

Great stuff!

7. The integrated testing is terrific

One of my regrets on DTrace is that we didn’t start on the DTrace test suite at the same time we started the project. And even after we starting building it (too late, but blessedly before we shipped it), it still lived away from the source for several years. And even now, it’s a bit of a pain to run — you really need to know it’s there.

This represents everything that’s wrong with testing in C: because it requires bespoke machinery, too many people don’t bother — even when they know better! Viz.: in the original statemap implementation, there is zero testing code — and not because I don’t believe in it, but just because it was too much work for something relatively small. Yes, there are plenty of testing frameworks for C and C++, but in my experience, the integrated frameworks are too constrictive — and again, not worth it for a smaller project.

With the rise of test-driven development, many languages have taken a more integrated approach to testing. For example, Go has a rightfully lauded testing framework, Python has unittest, etc. Rust takes a highly integrated approach that combines the best of all worlds: test code lives alongside the code that it’s testing — but without having to make the code bend to a heavyweight framework. The workhorses here are conditional compilation and Cargo, which together make it so easy to write tests and run them that I found myself doing true test-driven development with statemaps — namely writing the tests as I develop the code.
8. The community is amazing

In my experience, the best communities are ones that are inclusive in their membership but resolute in their shared values. When communities aren’t inclusive, they stagnate, or rot (or worse); when communities don’t share values, they feud and fracture. This can be a very tricky balance, especially when so many open source projects start out as the work of a single individual: it’s very hard for a community not to reflect the idiosyncrasies of its founder. This is important because in the open source era, community is critical: one is selecting a community as much as one is selecting a technology, as each informs the future of the other. One factor that I value a bit less is strictly size: some of my favorite communities are small ones — and some of my least favorite are huge.

For purposes of a community, Rust has a luxury of clearly articulated, broadly shared values that are featured prominently and reiterated frequently. If you head to the Rust website this is the first sentence you’ll read:

Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.

That gets right to it: it says that as a community, we value performance and robustness — and we believe that we shouldn’t have to choose between these two. (And we have seen that this isn’t mere rhetoric, as so many Rust decisions show that these values are truly the lodestar of the project.)

And with respect to inclusiveness, it is revealing that you will likely read that statement of values in your native tongue, as the Rust web page has been translated into thirteen languages. Just the fact that it has been translated into so many languages makes Rust nearly unique among its peers. But perhaps more interesting is where this globally inclusive view likely finds its roots: among the sites of its peers, only Ruby is similarly localized. Given that several prominent Rustaceans like Steve Klabnik and Carol Nichols came from the Ruby community, it would not be unreasonable to guess that they brought this globally inclusive view with them. This kind of inclusion is one that one sees again and again in the Rust community: different perspectives from different languages and different backgrounds. Those who come to Rust bring with them their experiences — good and bad — from the old country, and the result is a melting pot of ideas. This is an inclusiveness that runs deep: by welcoming such disparate perspectives into a community and then uniting them with shared values and a common purpose, Rust achieves a rich and productive heterogeneity of thought. That is, because the community agrees about the big things (namely, its fundamental values), it has room to constructively disagree (that is, achieve consensus) on the smaller ones.

Which isn’t to say this is easy! Check out Ashley Williams in the opening keynote from RustConf 2018 for how exhausting it can be to hash through these smaller differences in practice. Rust has taken a harder path than the “traditional” BDFL model, but it’s a qualitatively better one — and I believe that many of the things that I love about Rust are a reflection of (and a tribute to) its robust community.
9. The performance rips

Finally, we come to the last thing I discovered in my Rust odyssey — but in many ways, the most important one. As I described in an internal presentation, I had experienced some frustrations trying to implement in Rust the same structure I had had in C. So I mentally gave up on performance, resolving to just get something working first, and then optimize it later.

I did get it working, and was able to benchmark it, but to give some some context for the numbers, here is the time to generate a statemap in the old (slow) pure node.js implementation for a modest trace (229M, ~3.9M state transitions) on my 2.9 GHz Core i7 laptop: % time ./statemap-js/bin/statemap ./pg-zfs.out > js.svg real 1m23.092s user 1m21.106s sys 0m1.871s

This is bad — and larger input will cause it to just run out of memory. And here’s the version as reimplemented as a C/node.js hybrid: % time ./statemap-c/bin/statemap ./pg-zfs.out > c.svg real 0m11.800s user 0m11.414s sys 0m0.330s

This was (as designed) a 10X improvement in performance, and represents speed-of-light numbers in that this seems to be an optimal implementation. Because I had written my Rust naively (and my C carefully), my hope was that the Rust would be no more than 20% slower — but I was braced for pretty much anything. Or at least, I thought I was; I was actually genuinely taken aback by the results: $ time ./statemap.rs/target/release/statemap ./pg-zfs.out > rs.svg 3943472 records processed, 24999 rectangles real 0m8.072s user 0m7.828s sys 0m0.186s

Yes, you read that correctly: my naive Rust was ~32% faster than my carefully implemented C. This blew me away, and in the time since, I have spent some time on a real lab machine running SmartOS (where I have reproduced these results and been able to study them a bit). My findings are going to have to wait for another blog entry, but suffice it to say that despite executing a shockingly similar number of instructions, the Rust implementation has a different load/store mix (it is much more store-heavy than C) — and is much better behaved with respect to the cache. Given the degree that Rust passes by value, this makes some sense, but much more study is merited.

It’s also worth mentioning that there are some easy wins that will make the Rust implementation even faster: after I had publicized the fact that I had a Rust implementation of statemaps working, I was delighted when David Tolnay, one of the authors of Serde, took the time to make some excellent suggestions for improvement. For a newcomer like me, it’s a great feeling to have someone with such deep expertise as David’s take an interest in helping me make my software perform even better — and it is revealing as to the core values of the community.

Rust’s shockingly good performance — and the community’s desire to make it even better — fundamentally changed my disposition towards it: instead of seeing Rust as a language to augment C and replace dynamic languages, I’m looking at it as a language to replace both C and dynamic languages in all but the very lowest layers of the stack. C — like assembly — will continue to have a very important place for me, but it’s hard to not see that place as getting much smaller relative to the barnstorming performance of Rust!
Beyond the first impressions

I wouldn’t want to imply that this is an exhaustive list of everything that I have fallen in love with about Rust. That list is much longer would include at least the ownership model; the trait system; Cargo; the type inference system. And I feel like I have just scratched the surface; I haven’t waded into known strengths of Rust like the FFI and the concurrency model! (Despite having written plenty of multithreaded code in my life, I haven’t so much as created a thread in Rust!)
Building a future

I can say with confidence that my future is in Rust. As I have spent my career doing OS kernel development, a natural question would be: do I intend to rewrite the OS kernel in Rust? In a word, no. To understand my reluctance, take some of my most recent experience: this blog entry was delayed because I needed to debug (and fix) a nasty problem with our implementation of the Linux ABI. As it turns out, Linux and SmartOS make slightly different guarantees with respect to the interaction of vfork and signals, and our code was fatally failing on a condition that should be impossible. Any old Unix hand (or quick study!) will tell you that vfork and signal disposition are each semantic superfund sites in their own right — and that their horrific (and ill-defined) confluence can only be unimaginably toxic. But the real problem is that actual software implicitly depends on these semantics — and any operating system that is going to want to run existing software will itself have to mimic them. You don’t want to write this code, because no one wants to write this code.

Now, one option (which I honor!) is to rewrite the OS from scratch, as if legacy applications essentially didn’t exist. While there is a tremendous amount of good that can come out of this (and it can find many use cases), it’s not a fit for me personally.

So while I may not want to rewrite the OS kernel in Rust, I do think that Rust is an excellent fit for much of the broader system. For example, at the recent OpenZFS Developers Summit, Matt Ahrens and I were noodling the notion of user-level components for ZFS in Rust. Specifically: zdb is badly in need of a rewrite — and Rust would make an excellent candidate for it. There are many such examples spread throughout ZFS and the broader the system, including a few in kernel. Might we want to have a device driver model that allows for Rust drivers? Maybe! (And certainly, it’s technically possible.) In any case, you can count on a lot more Rust from me and into the indefinite future — whether in the OS, near the OS, or above the OS.
Taking your own plunge

I wrote all of this up in part to not only explain why I took the plunge, but to encourage others to take their own. If you were as I was and are contemplating diving into Rust, a couple of pieces of advice, for whatever they’re worth:

  • I would recommend getting both The Rust Programming Language and Programming Rust. They are each excellent in their own right, and different enough to merit owning both. I also found it very valuable to have two different sources on subjects that were particularly thorny.
  • Understand ownership before you start to write code. The more you understand ownership in the abstract, the less you’ll have to learn at the merciless hands of compiler error messages.
  • Get in the habit of running rustc on short programs. Cargo is terrific, but I personally have found it very valuable to write short Rust programs to understand a particular idea — especially when you want to understand optional or new features of the compiler. (Roll on, non-lexical lifetimes!)
  • Be careful about porting something to Rust as a first project — or otherwise implementing something you’ve implemented before. Now, obviously, this is exactly what I did, and it can certainly be incredibly valuable to be able to compare an implementation in Rust to an implementation in another language — but it can also cut against you: the fact that I had implemented statemaps in C sent me down some paths that were right for C but wrong for Rust; I made much better progress when I rethought the implementation of my problem the way Rust wanted me to think about it.
  • Check out the New Rustacean podcast by Chris Krycho. I have really enjoyed Chris’s podcasts, and have been working my way through them when commuting or doing household chores. I particularly enjoyed his interview with Sean Griffen and his interview with Carol Nichols.
  • Check out rustlings. I learned about this a little too late for me; I wish I had known about it earlier! I did work through the Rust koans, which I enjoyed and would recommend for the first few hours with Rust.

I’m sure that there’s a bunch of stuff that I missed; if there’s a particular resource that you found useful when learning Rust, message me or leave a comment here and I’ll add it.

Let me close by offering a sincere thanks to those in the Rust community who have been working so long to develop such a terrific piece of software — and especially those who have worked so patiently to explain their work to us newcomers. You should be proud of what you’ve accomplished, both in terms of a revolutionary technology and a welcoming community — thank you for inspiring so many of us about what infrastructure software can become, and I look forward to many years of implementing in Rust!

Categories: Personal Blogs