As newsrooms incorporate news application teams, one of the first questions they have to answer for themselves is what technologies to choose, and how to set up developer and web hosting environments that are sane and tuned to serve news apps. As part of her P5 Residency, developer Peggy Bustamante from Digital First Media's Project Thunderdome spent a few days mapping ProPublica's infrastructure. She's written a post that lays out some alternatives that came out of a discussion on the NICAR-L mailing list, and the answers that the Thunderdome team came up with.
What follows is ProPublica's advice on developer and server setups for news apps teams.
One of the early decisions you should make is which programming language and which development framework to use. Two prominent options are Ruby on Rails and Python/Django. There are other decent options like PHP/Laravel and even the Microsoft stack but if you want to share server-side code with other newsroom nerds (and you do) you're best off choosing between Ruby and Python.
The news apps world is split somewhere down the middle between Rails developers and Django developers, with the Rails developers seemingly concentrated on the island of Manhattan. Both frameworks really great, and they both inspire a lot of pride and trash talking among their adherents. Which one to pick is predominantly a personal preference. If you and your developers like coding in Ruby, use Rails. If they like coding in Python, use Django. Neither is an objectively correct choice. There are newsrooms who have built apps in .NET and in PHP, too. Within a certain bounds of sanity, my rule is that if you're going to be writing in a language all day every day for years, you might as well pick one you enjoy, and for which there's a decent chance you'll be able to hire experienced developers.
An important caveat: The one thing you shouldn't do is pick a language solely because of corporate standards or because "it's what the CMS is written in." News apps are a new kind of development for newsrooms, and they require a new, more nimble way of thinking. The benefits of corporate standardization are real, but they shouldn't dictate the tools developer/journalists use any more than business-side standards dictate how we report the news.
Once you've picked the language and framework, do your best to stick to them. Trying out new languages and approaches for each project increases your technical debt and complicates your life needlessly, and makes hiring more difficult.
The news app cube at ProPublica is currently eight people, but don't think you can't get started until you've got a budget that big. We started out much smaller (at first just me, then three of us for a long while), and many small teams around our industry make incredible news apps.
I often hear when I talk to newsrooms trying to assemble teams that there is no good, affordable dev talent in their area. My response is always the same: They're looking in the wrong places. Here's why:
The talents needed to be a great news app developer form a three-legged stool. The first is the ability to do journalism -- The imperative to get the facts right, "editorial judgment," etc. The second leg is enough design acumen to make an interactive presentation that people understand and that tells a coherent "story." The third leg is the ability to write code quickly. I would argue that this third leg is the most teachable of the three. The fact is, the aforementioned development frameworks do a lot of the real heavy lifting, code-wise; the kinds of apps we create -- where the data is complex but the interactions quite simple -- tend not to need truly high-performance dynamic code, and server caches like Varnish make even slightly inefficient code fast enough to handle enormous traffic.
So while you may think that you need a developer with a CS degree, and you may think you're competing with Google for talent, in fact the first place you should be looking for talent is your own newsroom. Most newsrooms I know have graphics editors and designers who have been secretly writing website scrapers and using graphics frameworks like D3 for years. Graphics people have the journalism chops and the design acumen to make outstanding news apps and often just need to be given the space and time, and maybe a little training, to step into the role of news app developer. Then, as time goes on, you'll find that their success grows better than linearly as you add resources.
It's also very important that the people on your news app teams think of themselves as journalists. They should file FOIAs and make reporting calls. If they're working with a traditional reporter they should read drafts and see outlines and story memos right from the beginning. Most importantly they should report to an editor and not the IT department. IT plays an important role in your newsroom but news application developers are much more like reporters than they are like corporate devs. They should be managed the right way.
At ProPublica, we're all on Macs, because the Mac OS's unix-like underpinnings make it a pretty natural environment to develop code for deployment on Linux servers, and its popularity makes it easy to integrate it into corporate email systems, to buy Photoshop for it, etc. We know other developers who happily use Linux on their dev workstations. It wouldn't surprise me to hear devs out there using Windows, though I'll leave as an exercise for the reader how to up Windows for web development.
We each have two monitors: One is a nice bright, sharp iMac screen which is useful when you're staring at a screen for 8-10 hours a day. The second is one we call a CDM, a low-end "cheap Dell monitor" which is useful to look at pages as we're building them. It's a worst-case scenario end-user setup. You'll find that some designs that look great on a beautiful Apple screen will seem washed out and almost-invisible on the CDM.
We do browser testing using several virtual machines, one for each supported Internet Explorer. We support IE8 and up, though you should decide what browsers to support for yourselves after taking a good look at your analytics. All the virtual machines run on a server in our server closet. Until recently we used the free version of VMWare Server and existing Windows licenses, so the setup costs were pretty minimal. We connect to the VMs Microsoft's Remote Desktop Connector rather than running virtualization software on our computers. Another option is to use the Microsoft-provided virtual machines, which can be installed free of charge with ievms.
While we frequently collaborate on news apps, we don't take the approach that the entire team works on one project at a time. One of us takes the lead on each project (you can tell who was the lead on a given project by looking at the bylines on the app: the lead developer is typically the first). We find that having one person who can take the time to read drafts, collaborate with a reporter, do any necessary research and reporting, and in general have the responsibility and vision for an app is really useful. Big scrums, while often very useful, can end up breaking up projects into such tiny chunks that journalistic opportunities are lost. Somebody doing data gathering and analysis should be thinking of story possibilities, and the designer should be very well steeped in the vagaries of the data, and the backend and front-end development work often flow together so much that a single developer should do them both anyway.
Naturally, not everybody is a true polymath and can do design, back-end code and journalism equally well. But each of us has strengths and we help each other. The stronger designers help the stronger coders, who help the stronger reporters.
ProPublica puts all of its code into source code management, which is an absolute requirement for development teams (and even solo developers). We use git. There are others, but the competition is essentially over and git won.
Using git enables us to work together on projects without clobbering each other’s work, to keep track of our changes and to easily roll back to previous versions when we need to.
While we're developing apps, we run them locally using Rails' built-in server. That lets us test and experiment with things outside production so there’s no risk of bugs or nonpublic material appearing on the Internet. We pass URLs to each other and to the wider newsroom using special local DNS records, so sending a test URL to a colleague is as easy as sending any other kind of URL, though of course they can only connect to our local apps while inside our offices.
Rails apps are pretty self-contained so this setup is pretty easy to manage. Some other teams use virtual machines to ensure a closer match between developer workstation and production server, but our system has suited us fine so far.
In our news applications, the data is often as finely honed as an artisanal cheese, so we tend to check even our data sets into version control, so we can version them and to make deploys easier.
When a ProPublica news app is ready to deploy, we use Capistrano to send our apps up to a production server. Capistrano is a ruby-based system (the Pythonic equivalent is called Fabric). It lets you specify exactly how your app should be deployed -- where on the server your apps are located, any web server config changes that need to be made to accommodate the code, and any special commands that need to be executed at any point in the deployment -- like cache invalidation, database re-seeding, restarting the app, etc.
Our deployment recipe is pretty specific to us, but Capistrano makes it easy to automate pretty much everything about your deployment strategy. It even backs up previous versions of your apps so it's dead simple to roll back in case something goes wrong with a deploy. If you want to enable less-technical people to execute some aspects of your deployment system you can explore Webistrano, which is sort of a web-enabled version of Capistrano.
One of the most important choices you'll make is your server environment. I'm going to talk about Linux and as you're still reading I'm going to assume you've already decided on using Linux.
There are a bunch of options for web hosting but I'll talk about two: Real-server hosting and the cloud.
With real servers, you may have the advantage of existing IT infrastructure. You may already have a server room and tons of bandwidth, and adding a few web servers would only be an incremental cost. But adding servers is a slow enough process that scaling out to meet temporary demand (say, adding ten servers on election day that you don't need anymore the day after) can attenuate the benefits of self-hosting. Also, in my experience IT departments want a say over what gets installed on servers they're expected to maintain, and depending on your IT department that might introduce cross-departmental management problems.
Cloud hosting gives you incredible flexibility when you need on-demand server instances, quick scale-outs, rapid OS migrations, etc. They're not as fast as real servers, and the costs can scale quickly relative to real servers. A low-end cloud server can sometimes cost only a few dollars or less a day (you pay by the hour), but a cloud servers can get expensive on the high end. Cloud is best when flexibility is more important than raw performance. In terms of security, cloud and real servers are a tie.
ProPublica’s production server environment uses Amazon Web Services. We use a fairly plain vanilla three-tier architecture with a cache server in front of a few application servers in front of database servers. Some of our databases are hosted using Amazon's face-meltingly cool Relational Database Service. RDS runs just like a MySQL for us. It can also be a drop-in replacement for Oracle and SQL Server, though we don't need those. RDS does not support PostgreSQL, so for apps that require it we run it on a real EC2 instance.
For cache we run Varnish, which is a very high performing write-through cache. Varnish makes it so that each unique request only hits our backend servers once. It's so insanely fast that we once had an app get 2 million page views in a few hours and our CPU load average barely budged off of zero. I'm pretty sure Varnish contains alien technology.
Behind our Varnish box are several app servers that run the news apps projects. They are not directly accessible over the Internet.
Incidentally, ProPublica's systems started much more simply -- we started out with just two -- even from the beginning you'll want to keep your database on a separate server. For a long time we only had two Amazon servers, and we served a lot of traffic with just that. Our current setup can scale out incredibly easily by adding more app servers.
Peggy's post talks about alternatives to doing things this way, and there are lots. In addition to using real servers instead of the cloud, some teams "bake out" their work using dedicated computers and then upload the baked-out static files to a service like Amazon's S3. Static files can't crash and they can handle insane amounts of traffic. But there are big tradeoffs. If you need to store user input, or if you have lots and lots of possible application end points, or if you want to let users search for arbitrary terms, baking out might not be the best option.
There's lots more to it but this should get you started. If you have questions, the best place to ask them is the NICAR-L email list. News app developers tend to lurk on that list.