Update on Sentiment Analysis of FOSS communities

One of my goals with my new open source project, FOSS Heartbeat, has been to measure the overall sentiment of communication in open source communities. Are the communities welcoming and friendly, hostile, or neutral? Does the bulk of positive or negative sentiment come from core contributors or outsiders? In order to make this analysis scale across multiple open source communities with years of logs, I needed to be able to train an algorithm to recognize the sentiment or tone of technical conversation.

How can machine learning recognize human language sentiment?

One of the projects I’ve been using is the Stanford CoreNLP library, an open source Natural Language Processing (NLP) project. The Stanford CoreNLP takes a set of training sentences (manually marked so that each word and each combined phrase has a sentiment) and it trains a neural network to recognize the sentiment.

The problem with any form of artificial intelligence is that the input into the machine is always biased in some way. For the Stanford CoreNLP, their default sentiment model was trained on movie reviews. That means, for example, that the default sentiment model thinks “Christian” is a very positive word, whereas in an open source project that’s probably someone’s name. The default sentiment model also consistently marks any sentence expressing a neutral technical opinion as having a negative tone. Most people leaving movie reviews either hate or love the movie, and people are unlikely to leave a neutral review analyzing the technical merits of the special effects. Thus, it makes sense that a sentiment model trained on movie reviews would classify technical opinions as negative.

Since the Stanford CoreNLP default sentiment model doesn’t work well on technical conversation, I’ve been creating a new set of sentiment training data that only uses sentences from open source projects. That means that I have to manually modify the sentiment of words and phrases in thousands of sentences that I feed into the new sentiment model. Yikes!

As of today, the Stanford CoreNLP default sentiment model has ~8,000 sentences in their training file. I currently have ~1,200 sentences. While my model isn’t as consistent as the Stanford CoreNLP, it is better at recognizing neutral and positive tone in technical sentences. If you’re interested in the technical details (e.g. specificity, recall, false positives and the like), you can take a look at the new sentiment model’s stats. This blog post will attempt to present the results without diving into guided machine learning jargon.

Default vs New Models On Positive Tone

Let’s take a look at an example of a positive code review experience. The left column is from the default sentiment model in Stanford CoreNLP, which was trained on movie reviews. The right column is from the new sentiment model I’ve been training. The colors of the sentence encode what the two models think the overall tone of the sentence is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Hey @1Niels ūüôā is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Hey @1Niels ūüôā is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Default vs New Models On Positive Tone

For the default model trained on movie reviews, it rated 4 out of 7 of the sentences as negative and 1 out of 7 sentences as positive. As you can see, the default sentiment model that was trained on movie reviews tends to classify neutral technical talk as having a negative tone, including sentences like “I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.” It did recognize the sentence “Thanks for the feedback!” as positive, which is good.

For the new model trained on comments from open source projects, it rated 1 sentence as negative, 2 as positive, and 1 as very positive. Most of the positive tone of this example comes from the use of smiley faces, which I’ve been careful to train the new model to recognize. Additionally, I’ve been teaching it that exclamation points ending a sentence that is overall positive shift the tone to very positive. I’m pleased to see it pick up on those subtleties.

Default vs New Models On Neutral Tone

Let’s have a look at a neutral tone code review example. Again, the sentence sentiment color key is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

Default vs New Models On Neutral Tone

Again, the default sentiment model trained on movie reviews classifies neutral review as negative, ranking 5 out of 6 sentences as negative.

The new model trained on open source communication is a bit mixed on this example, marking 1 sentence as positive and 1 negative, out of 6 sentences. Still, 4 out of 6 sentences were correctly marked as neutral, which is pretty good, given the new model has a training set that is 8 times smaller than the movie review set.

Default vs New Models On Negative Tone

Let’s take a look at a negative example. Please note that this is not a community that I am involved in, and I don’t know anyone from that community. I found this particular example because I searched for “code of conduct”. Note that the behavior displayed on the thread caused the initial contributor to offer to abandon their pull request. A project outsider stated they would recommend their employer not use the project because of the behavior. Another project member came along to ask for people to be more friendly. So quite a number of people thought this behavior was problematic.

Again, the sentiment color code is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Default vs New Models On Negative Tone

For the default model trained on movie reviews, it classifies 4 out of 9 sentences as negative and 2 as positive. The new model classifies 2 out of 9 sentences as negative and 2 as positive. In short, it needs more work.

It’s unsurprising that the new model doesn’t currently recognize negative sentiment very well right now, since I’ve been focusing on making sure it can recognize positive sentiment and neutral talk. The training set currently has 110 negative sentences out of 1205 sentences total. I simply need more negative examples, and they’re hard to find because many subtle personal attacks, insults, and slights don’t use curse words. If you look at the example above, there’s no good search terms, aside from the word arrogant, even though the sentences are still put-downs that create an us-vs-them mentality. Despite not using slurs or curse words, many people found the thread problematic.

The best way I’ve settled on to find negative sentiment examples is to look for “communication meta words” or people talking about communication style. My current list of search terms includes words like “friendlier”, “flippant”, “abrasive”, and similar. Some search words like “aggressive” yield too many false positives, because people talk about things like “aggressive optimization”. Once I’ve found a thread that contains those words, I’ll read through it and find the comments that caused the people to ask for a different communication style. Of course, this only works for communities that want to be welcoming. For other communities, searching for the word “attitude” seems to yield useful examples.

Still, it’s a lot of manual labor to identify problematic threads and fish out the negative sentences that are in those threads. I’ll be continuing to make progress on improving the model to recognize negative sentiment, but it would help if people could post links to negative sentiment examples on the FOSS Heartbeat github issue or drop me an email.

Visualizing Sentiment

Although the sentiment model isn’t perfect, I’ve added visualization for the sentiment of several communities on FOSS Heartbeat, including 24pullrequests, Dreamwidth, systemd, elm, fsharp, and opal.

The x-axis is the date. I used the number of neutral comments in an issue or pull request as the y-axis coordinate, with the error bars indicating the number of positive and negative comments. If the comment had two times the number of negative comments as positive comments, it was marked as a negative thread. If the comment had two times the number of positive comments than negative comments, it was marked as positive. If neither sentiment won, and more than 80% of the comments were neutral, it was marked as neutral. Otherwise the issue or pull request was marked as mixed sentiment.

Here’s an example:

24pullrequests-sentiment

The sentiment graph is from the 24pullrequests repository. It’s a ruby website that encourages programmers to gift code to open source projects during the 24 days in December before Christmas. One of the open source projects you can contribute to is the 24 pull requests site itself (isn’t that meta!). During the year, you’ll see the site admins filing help-wanted enhancements to update the software that runs the website or tweak a small feature. They’re usually closed within a day without a whole lot of back and forth between the main contributors. The mid-year contributions show up as the neutral, low-comment dots throughout the year. When the 24 pull request site admins do receive a gift of code to the website by a new contributor as part of the 24 pull requests period, they’re quite thankful, which you can see reflected in the many positive comments around December and January.

Another interesting example to look at is negative sentiment in the opal community:

opal-negative-sentiment

That large spike with 1207 neutral comments, 197 positive comments, and 441 negative comments is the opal community issue to add a code of conduct. Being able to quickly see which threads are turning into flamewars would be helpful to community managers and maintainers who have been ignoring the issue tracker to get some coding done. Once the sentiment model is better trained, I would love to analyze whether communities become more positive or more neutral after a Code of Conduct is put in place. Tying that data to whether more or less newcomers participate after a Code of Conduct is in place may be interesting as well.

There are a lot of real-world problems that sentiment analysis, participation data, and a bit of psychology could help us identify. One common social problem is burnout, which is characterized by an increased workload (stages 1 & 2), working at odd hours (stage 3), and an increase in negative sentiment (stage 6). We have participation data, comment timestamps, and sentiment for those comments, so we would only need some examples of burnout to identify the pattern. By being aware of the burnout stages of our collaborators, we could intervene early to help them avoid a spiral into depression.

A more corporate focused interest might be to identify issues where their key customers express frustration and anger, and focus their developers on fixing the squeaky wheel. If FOSS Heartbeat were extended to analyze comments on mailing lists, slack, discourse, or mattersmost, companies could get a general idea of the sentiment of customers after a new software release. Companies can also use the participation and data about who is merging code to figure out which projects or parts of their code are not being well-maintained, and assign additional help, as the exercism community did.

Another topic of interest to communities hoping to grow their developer base would be identifying the key factors that cause newcomers to become more active contributors to a project. Is it a positive welcome? A mentor suggesting a newcomer tackle a medium-sized issue by tagging them? Does adding documentation about a particularly confusing area cause more newcomers to submit pull requests to that area of code? Does code review from a particularly friendly person cause newcomers to want to come back? Or maybe code review lag causes them to drop off?

These are the kinds of people-centric community questions I would love to answer by using FOSS Heartbeat. I would like to thank Mozilla for sponsoring the project for the last three months. If you have additional questions you’d love to see FOSS Heartbeat answer, I’m available for contract work through Otter Tech. If you’re thankful about the work I’ve put in so far, you can support me through my patreon.

What open source community question would you like to see FOSS Heartbeat tackle? Feel free to leave a comment.

Impact of bots on github communities

I’ve been digging into contributor statistics for various communities on github as part of my work on FOSS Heartbeat, a project to measure the health of open source communities.

It’s fascinating to see bots show up in the contributor statistics. For example, if you look at github users who comment on issues the Rust community, you’ll quickly notice two contributors¬†who interact¬†a lot:

rust-bots

bors is a bot that runs pull requests through the rust continuous integration test suite, and automatically merges the code into the master branch if it passes. bors¬†responds to commands issued in pull request comments (of the form’@bors r+ [commit ID]’¬†by community members with permission to merge code into rust-lang/rust.

rust-highfive is a bot that recommends a reviewer based on the contents of the pull request. It then add a comment that tags the reviewer, who will get a github notification (and possibly an email, if they have that set up).

Both bots have been set up by the Rust community in order to make pull request review smoother. bors is designed to cut down the amount of time developers need to spend running the test suite on code that’s ready to be merged. rust-highfive is designed to make sure the right person is aware of pull requests that may need their experienced eye.

But just how effective are these github bots? Are they really helping the Rust community or are they just causing more noise?

Chances of a successful pull request

bors merged its first pull request on 2013-02-02. The year before bors was introduced, only 330 out of 503 pull requests were merged. The year after, 1574 out of 2311 pull requests were merged. So the Rust community had four times more pull requests to review.

Assuming that the tests bors used were some of the same tests rust developers were running manually, we would expect that pull requests would be rejected at about the same rate (or maybe rejected more, since the automatic CI system would catch more bugs).

To test that assumption, we turn to a statistics method called the Chi squared test. It helps answer the question, “Is there a difference in the success rates of two samples?” In our case, it helps us answer the question, “After bors was used, did the percentage of accepted pull requests change?”

rust-bors-merged

It looks like there’s no statistical difference in the chances of getting a random pull request merged before or after bors started participating. That’s pretty good, considering the number of pull requests submitted quadrupled.

Now, what about rust-highfive? Since the bot is supposed to recommend pull request reviewers, we would hope that pull requests would have a higher chance of getting accepted. Let’s look at the chances of getting a pull request merged for the year before and the year after rust-highfive was introduced (2014-09-18).

rust-highfive-merged

So yes, it does seem like rust-highfive is effective at getting the right developer to notice a pull request they need to review and merge.

Impact on time a pull request is open

One of the hopes of a programmer who designs a bot is that it will cut down on the amount of time that the developer has to spend on simple repetitive tasks. A bot like bors is designed to run the CI suite automatically, leaving the developer more time to do other things, like review other pull requests. Maybe that means pull requests get merged faster?

To test the impact of bors on the amount of time a pull request is open, we turn to the Two-means hypothesis test. It tells you whether there’s a statistical difference between the means of two different data sets. In our case, we compare the length of time a pull request is open. The two populations are the pull requests a year before and a year after bors was introduced.

rust-bors-pr-open

We would hope to see the average open time of a pull request go down after bors was introduced, but that’s not what the data shows. The graph shows the length of time actually increased, with an increase of 1.1 days.

What about rust-highfive? We would hope that a bot that recommends a reviewer would cause pull requests to get closed sooner.

rust-bors-pr-open

The graph shows there’s no statistical evidence that rust-highfive made a difference in the length of time pull requests were open.

These results seemed odd to me, so I did a little bit of digging to generate a graph of the average time a pull request is open for each month:

rust-pr-open-trend

The length of time pull requests are open has been increasing for most of the Rust project history. That explains why comparing pull request age before and after bors showed an increase in the wait time to get a pull request merged. The second line shows the point that rust-highfive was introduced, and we do see a decline in the wait time. Since the decrease is almost symmetrical with the increase the year before, the average was the same for the two years.

Summary

What can we conclude about github bots from all this statistics?

We can prove with 99% confidence that adding the bors bot to automatically merge changes after it passed the CI tests had no impact on the chances of a random pull request getting merged.

We can prove with 99% confidence¬†that rust-highfive increases¬†a Rust developer’s chances of getting code merged, by as much as 11.7%. The bot initially helped lower the amount of time developers had to wait for their pull requests to be merged, but something else changed in May 2015 that caused the wait time to increase again. I’ll note that¬†Rust version 1.0 came out on May 2015. Rust developers may have been more cautious about accepting pull requests after the API was frozen or the volume of pull requests may have increased. It’s unclear without further study.

This is awesome, can I help?

If you’re interested in metrics analysis for your community, please leave a note in the comments or drop an email to my consulting business, Otter Tech. I could use some help identifying the github usernames for bots in other communities I’m studying:

This blog post is part of a series on open source community metrics analysis:

Part 1: Measuring the Impact of Negative Language on FOSS Participation

You can find the open source FOSS Heartbeat code and FOSS community metrics on github. Thank you to Mozilla, who is sponsoring this research!

Measuring the Impact of Negative Language on FOSS Participation (Part I)

A recent academic paper showed that there were clear differences in the communication styles of two of the top Linux kernel developers (“Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List”). One leader is much more likely to say “thank you” while the other is more likely to jump into a conversation with a “well, actually”.

Many open source contributors have stories of their patches being harshly rejected. Some people are able to “toughen up” and continue participating, and others will move onto a different project. The question is, how many people end up leaving a project due to harsh language? Are people who experience positive language more likely to contribute more to a project? Just how positive do core open source contributors need to be in order to attract newcomers and grow their community? Which community members are good at mentoring newcomers and helping them step into leadership roles?

I’ve been having a whole lot of fun coming up with scientific research methods to answer these questions, and I’d like to thank Mozilla for funding that research through their Participation Experiment program.
words

How do you measure positive and negative language?

The Natural Language Processing (NLP) field tries to teach computers to parse and derive meaning from human language. When you ask your phone a question like, “How old was Ada Lovelace when she died?” somewhere a server has to run a speech to text algorithm. NLP allows that server to parse the text into a subject “Ada Lovelace” and other sentence parts, which allows the server to respond with the correct answer, “Ada Lovelace died at the age of 36”.

Several open source NLP libraries, including the Natural Language Toolkit (NLTK) and Standford CoreNLP also include sentiment analysis. Sentiment analysis attempts to determine the “tone” and objectiveness of a piece of text. I’ll do more of a deep dive into sentiment analysis next month in part II of this blog post. For now, let’s talk about a more pressing question.
wocintech (microsoft) - 62

How do you define open source participation?

On the surface, this question seems so simple. If you look at any github project page or Linux Foundation kernel report or Open Stack statistics, you’ll see a multitude of graphs analyzing code contribution statistics. How many lines of code do people contribute? How frequently? Did we have new developers contribute this year? Which companies had the most contributions?

You’ll notice a particular emphasis here, a bias if you will. All these measurements are about how much code an individual contributor got merged into a code base. However, open source developers don’t act alone to create a project. They are part of a larger system of contributors that work together.

In order for code or documentation to be merged, it has to be reviewed. In open source, we encourage peer review in order to make sure the code is maintainable and (mostly) free of bugs. Some reports measure the work maintainers do, but they often lack recognition for the efforts of code reviewers. Bug reports are seen as bad, rather than proof that the project is being used and its features are being tested. People may measure the number of closed vs open bug reports, but very few measure and acknowledge the people who submit issues, gather information, and test fixes. Open source projects would be constantly crashing without the contribution of bug reporters.

All of these roles (reviewer, bug reporter, debugger, maintainer) are valuable ways to contribute to open source, but no one measures them because the bias in open source is towards developers. We talk even less about the vital non-coding contributions people do (conference planning, answering questions, fund raising, etc). Those are invaluable but harder to measure and attribute.

For this experiment, I hope to measure some of the less talked-about ways to contribute. I would love to extend this work to the many different contributions methods and different tools that open source communities use to collaborate. However, it’s important to start small, and develop a good framework for testing hypothesis like my hypothesis about negative language impacting open source participation.

does it measure up?

How do you measure open source participation?

For this experiment, I’m focusing on open source communities on github. Why? The data is easier to gather than projects that take contributions over mailing lists, because the discussion around a contribution is all in one place, and it’s easy to attribute replies to the right people. Plus, there are a lot of libraries in different languages that provide github API wrappers. I chose to work with the github3.py library because it still looked to be active and it had good documentation.

Of course, gathering all the information from github isn’t easy when you want to do sentiment analysis over every single community interaction. When you do, you’ll quickly run into their API request rate limit of 5,000 requests per hour. There are two projects that archive the “public firehose” of all github events: http://githubarchive.org and http://ghtorrent.org However, those projects only archive events that happened after 2011 or 2012, and some of the open source communities I want to study are older than that. Plus, downloading and filtering through several terabytes of data would probably take just as long as slurping just the data I need through a smaller straw (and would allow me to avoid awkward conversations with my ISP).

For my analysis, I wanted to pull down all open and closed issues and pull requests, along with their comments. For a community like Rust, which has been around since 2010, their data (as of a week or two ago) looks like this:

  • 18,739 issues
  • 18,464 pull requests
  • 182,368 comments on issues and pull request
  • 31,110 code review comments

Because of some oddities with the github API (did you know that an issue json data can be for either an issue or a pull request?), it took about 20 hours to pull down the information I need.

I’m still sorting through how exactly I want to graph the data and measure participation over time. I hope to have more to share in a week!

*Edit* The code is available on github, and the reports for various open source communities are also available.

“I was only joking”

There was a very interesting set of tweets yesterday that dissected the social implications of saying, “I was only joking.” To paraphrase:

I’ve been mulling on the application of this analysis of humor with respect to the infamous “Donglegate” incident. Many men in tech responded with anger and fear over a conference attendee getting fired over a sexist joke. “It was only a joke!” they cried.

However, the justification falls flat if we assume that you’re never “just joking” and that jokes define in groups or out groups. The sexist joke shared¬†between two white males (who were¬†part of the dominant culture of conferences in 2013) defined them as part of the “in-group” and pushed the African American woman who overhead the “joke” into the “out-group”.

When the woman pushed back against the joke in by tweeting about it with a picture of the joker, the people¬†who were part of the in-group who found that joke “funny” were angry. When the joker was fired, it was a sign that they were no longer the favored, dominant group. Fear of loss of social status is a powerful motivator, which is what caused people from the joke’s¬†“in-group” to call for the woman to be fired as well.

Of course, it wasn’t all men who blasted the woman for reacting to a “joke”. There were many women who blasted the reporter for “public shaming”, or who thought the woman was being “too sensitive”, or rushed to reassure men that they had never experienced sexist jokes at conferences. Which brings us to the topic of “chill girls”:

The need for women to fit into a male-dominated tech world means that “chill girls” have to laugh at sexist jokes in order to be part of the “in-group”. To not laugh, or to call out the joker, would be to resign themselves to the “out-group”.

Humans have a fierce need to be socially accepted, and defining in-groups and out-groups is one way to secure that acceptance. This is exemplified in many people’s push back against what they see as too much “political correctness”.

For example, try getting your friends to stop using casually abelist terms like “lame”, “retarded”, “dumb”, or “stupid”. Bonus points if you can get them to remove classist terms like “ghetto” or homophobic statements like “that’s so gay”. What you’ll face are nonsense arguments like, “It’s just a word.” People who call out these terms are berated and no longer “cool”. Unconsciously or consciously, the person will try to preserve the in-groups and out-groups, and their own power from being a part of the in-group.

Stop laughing awkwardly. Your silence is only lending power to oppression. Start calling out people for alienating jokes. Stop preserving the hierarchy of classism, ablism, homophobia, transphobia, and sexism.

White Corporate Feminism

When I first went to the Grace Hopper Celebration of Women in Computing conference, it was magical. Being a woman in tech means I’m often the only woman in a team of male engineers, and if there’s more than one woman on a team, it’s usually because we have a project manager or marketing person who is a woman.

Going to the Grace Hopper conference, and being surrounded by women engineers and women computer science students, allowed me to relax in a way that I could never do in a male-centric space. I could talk with other women who just understood things like the glass ceiling and having to be constantly on guard in order to “fit in” with male colleagues. I had crafted a persona, an armor of collared shirts and jeans, trained myself to interrupt in order to make my male colleagues listen, and lied to myself and others that¬†I wasn’t interested in “girly” hobbies like sewing or knitting. At Grace Hopper, surrounded by women, I could stop pretending, and try to figure out how to just be myself. To take a breath, stop interrupting, and cherish the fact that I was listened to and given space to listen to others.

However, after a day or so, I began to feel uneasy about two particular aspects of the Grace Hopper conference. I felt uneasy watching how aggressively the corporate representatives at the booths tried to persuade the students to join their companies. You couldn’t walk into the ballroom for keynotes without going through a gauntlet of recruiters. When I looked around the ballroom at the faces of the women surrounding me, I realized the second thing that made me uneasy. Even though Grace Hopper was hosted in Atlanta that year, a city that is 56% African American, there weren’t that many women of color attending. We’ve also seen the¬†Grace Hopper conference feature more male keynote speakers, which is problematic when the goal of the conference is to allow women to connect to role models that look like them.

When I did a bit of research for this blog post, I looked at the board member list for Anita Borg Institute, who organizes the Grace Hopper Conference. I was unsurprised to see major corporate executives hold the¬†majority of Anita Borg Institute board seats. However, I was curious why the board member page had no pictures on it.¬†I used Google Image search in combination with the board member’s name and company to create this image:
anita-borg-board

My unease was recently echoed by Cate Huston, who also noticed the trend towards corporations trying to co-opt women’s only spaces to feed women into their toxic hiring pipeline. Last week, I also found this excellent article on white feminism, and how white women need to allow people of color to speak up about the problematic aspects of women-only spaces. There was also an interesting article last week about how “women’s only spaces” can be problematic for trans women to navigate if they don’t “pass” the white-centric standard of female beauty. The article also discusses that by promoting women-only spaces as “safe”, we are unintentially promoting the assumption that women can’t be predators, unconsciously sending the message to victims of violent or abusive women that they should remain silent about their abuse.

So how to do we expand women-only spaces to be more inclusive, and move beyond white corporate feminism? It starts with recognizing the problem often lies with the white women who start initiatives, and fail to bring in partners who are people of color. We also need to find ways to fund inclusive spaces and diversity efforts without big corporate backers.

We also need to take a critical look at how well-meaning diversity efforts often center around improving tech for white women. When you hear a white male say, “We need more women in X community,” take a moment to question them on why they’re so focused on women and not also bringing in more people of color, people who are differently abled, or LGBTQ people. We need to figure out how to expand the conversation beyond white women in tech, both in external conversations, and in our own projects.

One of the projects I volunteer for is¬†Outreachy, a three-month paid internship program to increase diversity in open source. In 2011, the coordinators were told the language around encouraging “only women” to apply wasn’t trans-inclusive, so they changed the application requirements to clarify the program was open to both cis and trans women. In 2013, they¬†clarified that Outreachy was also open to trans men and gender queer people. Last year, we wanted to open the program to men who were traditionally underrepresented in tech. After taking a long hard look at the statistics, we expanded the program to include all people in the U.S. who are Black/African American, Hispanic/Latin@, American Indian, Alaska Native, Native Hawaiian, or Pacific Islander. We want to expand the program to additional people¬†who are underrepresented in tech in other countries, so please contact us if you have good sources of diversity data for your country.

But most importantly, white people¬†need to learn to listen to people of color instead of being a “white savior”. We need to believe people of color’s lived experience, amplify their voices when people of color tell us they feel isolated in tech, and stop insisting¬†“not all white women” when people of color critique a problematic aspect of the feminist community.

Trying to move into more intersectional feminism is one of my goals, which is why I’m really excited to speak at the Richard Tapia Celebration of Diversity in Computing. I hadn’t heard of it until about a year ago (probably because they have less corporate sponsorship and less marketing), but it’s been described to me as “Grace Hopper for people of color”. I’m excited to talk to people about open source and Outreachy, but most importantly, I want to go and listen to people who have lived experiences that are different from mine, so I can promote their voices.

If you can kick in a couple dollars a month to help me cover costs for the conference, please donate on my Patreon. I’ll be writing about the people I meet at Tapia on my blog, so look for a follow-up post in late September!

Code of Conduct Warning Signs

I’ve got something on my chest that needs to be expressed. It’s likely to be a bit ranty, because I’ve got some scars around dealing with this issue. I want to talk about Codes of Conduct (CoCs).

No Trespassing!

Over the last five years, I’ve watched the uptick in adoption of CoCs in open source conferences. I’ve watched conferences try to adopt a CoC and fall completely flat on their face because they completely misunderstood the needs of minorities at their conferences. In recent years, I’ve watched open source communities start to adopt CoCs. For some communities, a CoC is an after thought, a by-product of community leadership stepping up in many different ways to increase diversity in open source.

However, a worrysome trend is happening: I see communities starting to adopt Codes of Conduct without thinking through the implications of them. A CoC has become a diversity checkmark.

Why is this? Perhaps it’s because stories of harassment has become wide spread. People look at the abuse that G4mer Goobers have thrown at women developers, especially women of color and trans women, and they say, “I don’t want those types of people in my community.” For them, a Code of Conduct has become a “No Trespassing” sign for external harassers.

In general, that’s fine. It’s good to stand up to harassers and say, “That’s not acceptable.” People hope that adding a Code of Conduct is like showing garlic to a vampire: they’ll hiss and run off into the darkness.

Pot, meet Kettle

However, a lot of people who are gung-ho about banning anonymous online harassers are often reluctant to clean their own house. They make excuses for the long-standing harassers in their community, and they have no idea how they would even enforce a CoC against someone who is an entrenched member of the community. Someone who organizes conferences. Someone who is a prolific reviewer. Someone who is your friend, your colleague, your drinking buddy.

You see, no one wants to admit that they are “that person”. It’s hard to accept that everyone, including your friends, are unconsciously biased. It’s even harder to admit that your friends are slightly racist/homophobic/transphobic/etc. No one wants to recognize the ablist language they use in their every day life, like “lame”, “dumb”, or “retarded”. It’s tough to admit that your conference speakers are mostly cis white men because you have failed to network with minorities. It’s difficult to come to grips with the fact that your leadership is toxic.¬†It’s embarrassing to admit that you may be too privileged and so lacking in understanding of minorities’ lived experiences that you may need to reach outside your network to find people to help you deal with Code of Conduct incidents.

Code of Conduct Enforcement

And you will have incidents. People will report Code of Conduct violations. The important question is, how will you handle those incidents and enforce your CoC? You’ve put a “No Trespassing” sign up, but are you willing to escort people out of your community? Take their commit access away? Ask them to take a break from the mailing list? If you don’t decide up front how you’re going to enforce your Code of Conduct, you’re going to apply it unfairly. You’ll give your buddy a break, make excuses like, “But I know they’ve been working on that,” or, “Oh, yeah, that’s just so-and-so, they don’t mean that!”

You need to decide how you’ll enforce a Code of Conduct, and find diverse leadership to help you evaluate CoC violations. And for the love of $deity, if the minorities and louder allies on your enforcement committee say something is a problem, believe them!

Let’s fork it!

Another worrisome trend I see is that the people working on creating Codes of Conduct are not talking to each other. There is so much experience in the open source community leadership in enforcing Codes of Conduct, but it’s become a bike shed issue. Communities without experience in CoC enforcement are saying, “I’ll cherry-pick this clause from this CoC, and we’ll drop that clause because it doesn’t make sense for our community.”

We don’t write legal agreements without expert help. We don’t write our own open source licenses. We don’t roll our own cryptography without expert advice. We shouldn’t roll our own Code of Conduct.

Why? Because if we roll our own Code of Conduct without expert help, it creates a false sense of security. Minorities who rely on a Code of Conduct to grant them safety in an open source community will get hurt. If leadership is implementing a Code of Conduct as a diversity check mark, it papers over the real problem of a community that is unwilling to put energy into being inclusive.

Diversity Check Mark Complete!

I also see smaller communities scrambling to get something, anything, in place to express that they’re a safe community. So they take a standard Code of Conduct and slap it into place, without modifying it to express their communities’ needs. They don’t think about what behaviors they want to encourage in order to make their community a safe place to learn, create, and grow. They don’t think about how they could attract and retain diverse contributors (hint, I recently talked about some ideas on that front). They don’t think about the steps that they as leaders need to take in order to expand their understanding of minorities’ lived experiences, so that they can create a more inclusive community. They don’t think about the positive behaviors they want to see in their community members.

When I see an unmodified version of a Code of Conduct template in a community, I know the leadership has put up the “No Trespassing” sign to stop external harassers from coming in. But that doesn’t mean the community is inclusive or diverse. It could be a walled garden, with barriers to entry so high that only white men with unlimited amounts of spare time and a network of resources to help them can get inside. It could be a barb-wire fence community with known harassers lurking inside. Or it could be a community that simply found another CoC was good enough for them. I can’t know the difference.

Ask for Expert Advice

My take away here is that implementing a Code of Conduct is a hard, long, process of cultural change that requires buy-in from the leadership in your community. Instead of having an all-out bike-shed thread on implementing a CoC, where people cherry-pick legal language without understanding the implementation details of removing that language, go talk with an expert. Safety First PDX, Ashe Dryden, and Frame Shift Consulting are happy to provide consulting, for a fee. If you don’t have money to pay them (and you should pay women for the emotional labor they do to create welcoming communities!), then you’ll need to spend a bunch of time educating yourself.

Read *everything* that Safety First PDX has to say about Code of Conduct design and enforcement. Read the HOW-TO design a Code of Conduct post on the Ada Initiative website. Watch Audrey Eschright talk about Code of Conduct enforcement. Look at the community code of conduct list on the Geek Feminism wiki. These are all a long reads, but these are known experts in the field who are offering their expertise to keep our open source communities safe.

In Conclusion

Don’t roll your own Code of Conduct without expert advice. You wouldn’t roll your own cryptography. At the same time, don’t make a Code of Conduct into a check mark.

Metrics of Haters

When I posted my Closing a Door post, I mentioned that a team of moderators would be filtering comments for me. Comments that did not meet my comment policy would not be approved. Moderators also found that some comments simply did not further the conversation, were unclear and confusing due to translation issues, or were just contentless spews of hatred.

The comments on that post are now closed. The moderators approved a total of 254 comments, with 213 comments on my “Closing a Door” post, and 39 comments on my follow-up post “What Makes A Good Community?” The moderators also filtered out 186 comments total on those two posts. Now that the internet shit storm is over, I thought it would be interesting to take a peek into the acid-filled well in order to pull out some metrics.

Of course, I didn’t want to actually read the comments. That be silly! It would completely defeat the purpose of having comment moderators and let the trolls win. So, instead I used the power of open source to generate the metrics. ¬†I used the WordPress Exporter plugin to export all the comments on the two posts in XML. Then I used the python wpparser library to parse the XML into something sensible. From there, the program wrote the commenters’ names, email addresses, and IP addresses [1] into a CSV. I did some manual categorization of that information in Google docs.

Repeat Offenders or Drive-by Haters?

70% of the 186 filtered comments were from unique IP addresses. The remaining 30% of comments were generated by 19 different people, who left an average of three comments each. The most persistant troll commented 10 times.

Anonymous Cowards or Brave Truth Tellers?

72% of the 186 comments did not include a full name. Of the commenters that did not include a full name:

  • 39 people used just a first name, making up 24% of the comments.
  • 25 people used what looks like internet nicks, accounting for 16% of the comments.
  • 17 people used various forms of the word “anonymous” in the name field, making up 9% of the comments.
  • 12 people used an English word instead of a name, accounting for 8% of the comments.
  • 4 people used obviously fake names, accounting for 7% of the comments.
  • 8 people used their initials or one letter, accounting for 5% of the comments.
  • 5 people used a slur in their name, accounting for 3% of the comments.
  • 2 people used a threat in their name, accounting for 1% of the comments. [Edit: make that 3, or 2%]

Community Members or Internet Trolls?

38 people used a full name, accounting for 28% of the comments. That means approximately 1/3 were brave enough to put their real name behind their comments. (Or a full fake name.) The question becomes, are these people actually a part of the open source community? Are they people who have actually interacted on an open source mailing list before? To answer these questions, I choose to search the author name in the Mailing List Archives (MARC) where a variety of open source mailing lists are archived, including the Linux kernel subsystem mailing lists, BSD, database lists, etc.

Of the 38 people who used their real name, 14 people had interacted on an open source mailing list archived by MARC. They made up 8% of the filtered comments. Ten of those people had more than 10 mails to the lists.

[Edit] Of the 25 people that used what looked like internet nicks, 11 of them may be open source users (see analysis below in the comments). That accounted for 8% of the filtered comments.

The important take away here is that only 16% of the filtered comments were made by open source users and developers. This is an important finding, since the article itself was about open source community dynamics.

[1] Before you scream about privacy, note that my comment policy allows me to collect and potentially publish this information.

Building a custom Intel graphics stack

When I worked as a Linux kernel developer, I often ran across people who were very concerned about compiling and installing a custom kernel. They really didn’t like running the bleeding edge kernel in order to check that a specific bug still existed. Who can blame them? A rogue kernel can corrupt your file system and you could lose data.

Fortunately, there is a safer way to build the latest version of drm and Mesa in order to check if an Intel bug still exists in the master branch. Since Mesa is just a userspace program, it is possible to install it to a custom directory, set a lot of environment variables right, and then your programs will dynamically link against that custom drm and Mesa binaries. Your desktop programs will still run under your distro’s system-installed mesa version, but you can run other programs linked against your custom mesa.

Unfortunately, mesa has some dependencies, and the instructions for how to build into a custom directory are kind of scattered all over the mesa homepage, the DRI wiki, the Xserver wiki, github, and mailing list posts. I’m going to attempt to condense these instructions into one single place, and then clean up those pages to be consistent later.

Debug build or better performance?

In this tutorial, I’ll assume that you want to build a version of drm and mesa with debugging enabled. This *will* slow down performance, but it will enable you to get backtraces, run gdb, and gather more debugging information than you normally would. If you don’t want a debug build, remove the parts of the commands that add the “debug” flag to the USE environment variable or config.mk files.

The point of this tutorial is to be able to install drm and mesa in a directory, so that you don’t have to install them over your distro’s binaries. This means you’ll be able to run the specific test you need, without running into other bugs by running your full desktop environment on the bleeding edge graphics stack. In this tutorial, I will assume you want to put your graphics installation in $HOME/graphics-install. Change that to whatever your heart’s desire is.

If you are working behind a proxy, you’ll need to have a .gitconfig file in your homedir that tells git how to clone through the proxy.

I also assume you’re running a debian-based system, specifically Ubuntu 14.04 in my case. If you’re on an RPM-based distro, change the package install commands accordingly.

Get mesa dependencies

sudo apt-get build-dep libdrm mesa mesa-utils
sudo apt-get install linux-headers-`uname -r` \
    libxi-dev libxmu-dev x11proto-xf86vidmode-dev \
    xutils-dev mesa-utils llvm git autoconf automake \
    libtool ninja-build libgbm-dev

Clone the repositories

mkdir git; cd git
git clone git://anongit.freedesktop.org/git/mesa/mesa
git clone git://anongit.freedesktop.org/mesa/drm
git clone git://github.com/chadversary/dev-tools.git
git clone git://anongit.freedesktop.org/piglit
git clone git://github.com/waffle-gl/waffle.git
git clone git://anongit.freedesktop.org/mesa/demos.git

Set up Chad’s development tools

Chad Versace has been working on a set of scripts that will set up all the right environment variables to run programs that will use custom-installed mesa and drm binaries. Let’s get those configured properly.

Edit your .bashrc to include the following lines:

export GOPATH=$HOME/go
export PATH=/usr/lib/ccache:$HOME/bin:$PATH:$GOPATH/bin
export PYTHONPATH=~/bin/mesa-dev-tools/bin/:$PYTHONPATH

Now it’s time to set up Chad’s tools.

cd dev-tools/

We’ll be installing everything in ~/graphics-install, so we need to create a config.mk file with this contents:

prefix := $(HOME)/graphics-install
USE := "debug"

This will add the debug flag to all builds, which will add symbols so you can use gdb (as well as add some additional code that could impact performance, so don’t add the flag if you’re doing performance testing!).

Build and install the development scripts:

make && make install

Exit your current shell, and start a new shell, so that the changes to the .bashrc and the installation of Chad’s scripts take effect.

Next, we need to get all the paths set properly to use Chad’s scripts to build mesa and libdrm into ~/graphics-install. We invoke the prefix-env script, and tell it to exec the command to start a new shell:

cd git/dev-tools
PREFIX="$HOME/graphics-install" USE="debug" \
    bin/prefix-env exec --prefix=$HOME/graphics-install bash

Double check that worked by seeing whether we have the right mesa-configure script on our path:

sarah@dingo:~/git/dev-tools$ which mesa-configure
/home/sarah/graphics-install/bin/mesa-configure

Check which Mesa version you’re running. Later, after installing a custom Mesa, we’ll verify the installation by confirming that the active Mesa version has changed.

sudo glxinfo > /tmp/glxinfo-old.txt

Note that glxinfo calls through the Xserver to get the information for what Mesa we’re using. Note that if your system xorg installation is too old, the Xserver won’t be able to find an API-compatible version of Mesa, and you’ll see errors like:

Error: "couldn't find RGB GLX visual or fbconfig"

Fortunately, we can run many Mesa programs without involving the Xserver. Another way to find what version of mesa you’re running without going through the Xserver is to use wflinfo command:

sudo wflinfo --platform gbm --api gl > /tmp/wflinfo-old.txt

We can see which version of mesa (11.0.2) is installed by default on Ubuntu 14.04:

sarah@dingo:~/git/dev-tools$ grep Mesa /tmp/*info-old.txt
client glx vendor string: Mesa Project and SGI
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.0.2
OpenGL version string: 3.0 Mesa 11.0.2
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.0.2

Now, that we have Chad’s tools set up, and we’ve set up the environment variables, it’s important to build everything from the shell where you ran the prefix-env command. If you need to open up additional virtual consoles, make sure to change into ~/git/dev-tools/ and re-run the prefix-env command.

Building libdrm

The direct rendering manager library, libdrm, is a prerequistite for mesa. The two projects are pretty intertwined, so you need to have updated installations of both.

Change directories into your libdrm repository, and configure libdrm with the libdrm-configure script (note that PREFIX was already set when we exec’ed with the prefix-env script):

USE="debug" libdrm-configure
make && make install

Building Mesa

Change directories into your mesa repository, and configure mesa with the mesa-configure script:

cd ../mesa
USE="debug" mesa-configure
make && make install

Building Waffle

Waffle is a library for selecting an OpenGL API and window system at runtime.

cd ../waffle
USE=debug waffle-configure
ninja && ninja install

For some reason, waffle is different from all the other projects, and likes to install libraries into $PREFIX/lib/x86_64-linux-gnu/ If you’re on a debian-based system, you may have to change the configuration files, or simply move the libraries one directory down.

Building glxinfo

Confusingly, a useful debugging tool like glxinfo is found in a mesa repository named demos. Change into that directory:

cd ../demos

Since Chad’s tools don’t cover installation of the demo tools, we’ll have to configure them by hand:

autoreconf --verbose --install -s
./configure --prefix="$HOME/graphics-install"
make -j8 && make install

Confirm installation

Confirm that the environment’s Mesa version matches the version you installed. It should differ from the Mesa version we checked earlier.

sudo glxinfo > /tmp/glxinfo-new.txt

Or run wflinfo instead:

sudo wflinfo --platform gbm --api gl > /tmp/wflinfo-new.txt
grep Mesa /tmp/*info-new.txt

You should see something about a development version of mesa in the output.

Building Piglit

Piglit is the test infrastructure and tests for libdrm and mesa.¬† Let’s build it:

cd ../piglit
USE=debug piglit-configure

Piglit has a slightly different build system than drm, mesa, and waffle. After the first build, the dependencies for piglit tests means it takes a very long time to recompile the tests after a small change is made. This is due to the simplicity of cmake. Instead, it’s recommended to use the ninja build system with piglit.

Make piglit:

ninja

Install piglit:

ninja install

Run your tests

Anytime we want to use the newly installed mesa and drm, we need to rerun the prefix-env script to set up all the graphics environment variables to point to those binaries:

PREFIX="$HOME/graphics-install" USE="debug" \
    bin/prefix-env exec --prefix=$HOME/graphics-install bash

Since we haven’t compiled the full Xserver stack, we have to run piglit with a different platform than X11. If you run `piglit run –help`, you’ll see that a platform could be x11_egl, glx (which is actually calling the GLX api through the Xserver), mixed_glx_egl (which also implies going through the Xserver), wayland, and gbm. The most simple platform is gbm.

Here’s how you run your very first sanity test with gbm:

PIGLIT_PLATFORM=gbm ./piglit run \
    tests/sanity.tests results/sanity.results

If the output says you passed, give yourself a pat on the back! If not, you probably don’t have something installed correctly. You may want to exit all shells, run `git clean -dfx` (to clean out all tracked files) in all the repos, and try again.

To get a more detailed test report, you can run the `piglit summary` command with either console (for text output, good if you don’t have X running), or with html to generate pretty webpages for you to look at on another machine. Piglit will also output test results in a form Jenkins can use.

./piglit summary console results/sanity.results
./piglit summary html --overwrite summary/sanity results/sanity.results

You’ll need the overwrite or append flag if you’re writing results to the same directory.

There’s even more explanations of what you can do with piglit on these two blog posts.

Running games or benchmarks

Once you’ve run the prefix-env script, you should be able launch benchmarks or other tests. Running games or steam with a custom mesa installation is harder. Since most games are going to use the Xserver platform to call into Mesa’s GL or EGL API, you may need to compile a new Xserver as well.

Optional kernel installation

Sometimes you may want to run the bleeding-edge Intel graphics kernel. Confusingly, the kernel isn’t hosted on git.kernel.org! Use the drm-intel-nightly branch from the drm-intel repo on freedesktop.org:

git clone git://anongit.freedesktop.org/drm-intel

Instructions on compiling a custom kernel can be found here:

http://kernelnewbies.org/FirstKernelPatch

Additionally, you may need to set the i915 kernel module parameter to enable new hardware support.  You can do this by changing the line your grub configuration defaults in /etc/default/grub to this:

GRUB_CMDLINE_LINUX_DEFAULT="i915.preliminary_hw_support=1"

And then you’ll need to update your grub configuration files in /boot by running:

sudo update-grub

Graphics linkspam: Bugs, bugs, I’m covered in bugs!

Reporting bugs to Intel graphics developers (or any open source project) can be intimidating. You want the right developers to pay attention to your bug, so you need to provide enough information to help them classify the bug. Ian Romanick describes what makes a good Mesa bug report.

One of the things Ian talks about is tagging your bug report with the right Intel graphics code name, and providing PCI ID information for the graphics hardware. Chad Versace provides a tool to find out which Intel graphics you have on your system. That tool is also useful for translating the marketing names to code names and hardware details (like whether your system is a GT2 or GT3).

In the “omg, that’s epic” category,¬†Adrian¬† analyzes the graphics techniques used in Grand Theft Auto V on PS3. It’s a great post with a lot of visuals. I love the discussion of deleting every-other-pixel to improve performance in one graphics stage, and then extrapolating them back later. It’s an example of something that’s probably really hardware specific, since Kristen Hogsberg mentioned he doesn’t think it will be much help on Intel graphics hardware. When game designers know they’re only selling into one platform, they can use hardware-specific techniques to improve graphics performance. However, it will bite them later if they try to port their game to other platforms.

What makes a good community?

*Pokes head in, sees comments are generally positive*

There’s been a lot of discussion in my comment sections (and on LWN) about what makes a good community, along with suggestions of welcoming open source communities to check out. Your hearts are in the right place, but I’ve never found an open source community that doesn’t need improvement. I’m quite happy to give the Xorg community a chance, mostly because I believe they’re starting from the right place for cultural change.

The thing is, reaching the goal of¬†a diverse community is a step-by-step process. There are no shortcuts. Each step has to be complete before the next level of cultural change is effective.¬†It’s also worth noting that each step along the way benefits all community members, not just diverse contributors.

Level 0: basic human decency

In order to attract diverse candidates, you need to be known as a welcoming community, with a clear set of agreed-upon social norms. It’s not good enough to have a code of conduct. Your leaders need to be actively behind it, and it needs to be enforced.

A level 0 welcoming community exhibits the following characteristics:

Level 1: on-boarding

The next phase in improving diversity is figuring out how to on-board newcomers. If diverse candidates are only 1-10% of newcomers, but you have a 90% fail rate for people who try to make their first contribution, well, you can’t expect many diverse newcomers to stick around, can you? It’s also essential to explain your unwritten tribal knowledge, so that diverse candidates (who are more likely to be afraid of upsetting the status quo) know what they’re getting into.

Signs of a level 1 welcoming community:

  • Documentation on where to interact with the community (irc, mailing list, bug tracker, etc)
  • In-person conferences to encourage networking with new members
  • Video or in-person chats to put a face to a name and encourage empathy and camaraderie
  • Documented first steps for compiling, running, testing, and polishing contributions
  • Easy, no-setup web harness for testing new contributions
  • Step-by-step tutorials, which are kept up-to-date
  • Coding style (what’s required and what’s optional, and who to listen to when developers disagree)
  • Release schedule and feature cut-off dates
  • How to give back non-code contributions (bug reports, docs, tutorials, testing, event planning, graphical design)

Level 2: meaningful contributions

The next step is figuring out what to do with these eager new diverse candidates. If they’ve made it this far through the gauntlet of toxic tech culture, they’re likely to be persistent, smart, and seeking a challenge. If you don’t have meaningful bigger projects for them to contribute to, they’ll move onto the next shiny thing.

Signs of a level 2 welcoming community:

  • Newbie todo lists
  • Larger, self-contained projects
  • Welcoming, available mentors
  • Programs to pay newbies (internships, summer of code, etc)
  • Contributors are thanked with heartfelt sincerity and an explicit acknowledgment of what was good and what could be improved
  • Community creates a casual feedback channel for generating ideas with newcomers (irc, mailing list, slack, whatever works)
  • Code of conduct encourages developers to assume good intent

Level 3: succession planning

The next step for a community is to figure out how to retain those diverse candidates. How do you promote these new, diverse voices in order to ensure they impact your community at a leadership level? If your leadership is stale, comprised of the same “usual faces”, people will leave when they start wanting to have more of a say in decisions. If your community sees bright diverse people quietly leave, you may need to focus on retention.

Signs of a level 3 welcoming community:

  • Reviewers are rewarded and questions from newcomers on unclear contributions are¬†encouraged
  • Leaders and/or maintainers are rotated on a set time schedule
  • Vacations and leaves of absence are encouraged, so backup maintainers have a chance to learn new skills
  • Community members write tutorials on the art of patch review, release management, and the social side of software development
  • Mentorship for new presenters at conferences
  • Code of conduct encourages avoiding burnout, and encourages respect when people leave

Level 4: empathy and awareness

Once your focus on retention and avoiding developer burnout is in place, it’s time to tackle the task most geeks avoid: general social issues. Your leaders will have different opinions, as all healthy communities should! However, you need to take steps to ensure the loudest voice doesn’t always win by tiring people out, and that less prominent and minority voices are heard.

Signs of a level 4 welcoming community:

  • Equally values developers, bug reporters, and non-code contributors
  • Focuses on non-technical issues, including in-person discussions of cultural or political issues with a clear follow-up from leaders
  • Constantly improves documentation
  • Leadership shows the ability to recognize their mistakes and change when called out
  • Community manager actively enforces the code of conduct when appropriate
  • Code of conduct emphasizes listening to different perspectives

Level 5: diversity

Once you’ve finally got all that cultural change in place, you can work on actively seeking out more diverse voices and have a hope of retaining them.

Signs of a level 5 welcoming community:

  • Leadership gatherings include at least 30% new voices, and familiar voices are rotated in and out
  • People actively reach outside their network and the “usual faces” when searching for new leaders
  • Community participates in diversity programs
  • Diversity is not just a PR campaign – developers truly seek out different¬†perspectives and try to understand their own privilege
  • Gender presentation is treated as a non-issue at conferences
  • Conferences include child care, clearly labeled veggie and non-veggie foods, and a clear event policy
  • Alcoholic drinks policy encourages participants to have fun, rather than get smashed
  • Code of conduct explicitly protects diverse developers, acknowledging the spectrum of privilege
  • Committee¬†handling enforcement of the code of conduct includes diverse¬†leaders from the community

The thing that frustrates me the most is when communities skip steps. “Hey, we have a code of conduct and child care, but known harassers are allowed at our conferences!” “We want to participate in a diversity program, but we don’t have any mentors and we have no idea what the contributor would work on long term!” So, get your basic cultural changes done first, please.

*pops back off the internet*

Edit: Please stop suggesting BSDs or Canonical/Ubuntu as “better” communities.