Eric Busboom may not be the kind of guy who comes to mind when you think of a librarian. But maybe he’s what a librarian should be as more of us rely on smartphones and computers to make sense of the world.
“Librarians have a stuffy image,” Busboom told me. “They’re amazingly forward-thinking. They’ve got an incredible service-oriented outlook. They aren’t beholden to books. They tend to think of their jobs as connecting the public to information, whatever that information is.”
But in January, Busboom founded the San Diego Regional Data Library, to “make our social programs, civic groups, and government organizations more effective; our citizens better informed; and our policy makers able to make better decisions,” according to the library’s website.
Translation: The library is collecting information on crime, traffic patterns, streetlights, alcohol permits and other quality-of-life issues to help San Diegans learn more about one another and the communities they live in.
But Busboom doesn’t want to throw a “pile of files” online. He wants to help San Diegans understand what’s in those files. He wants to teach us how to make sense of that information with computer programs that are free for anyone to use.
You’ve been in the open data game for a while. How long exactly?
The library opened up operations in January. It’s been about nine months. But the previous year was spent in developing the concept. I spent a lot of time interviewing people. I probably talked to about 90 interview subjects so far across the country, so we’ve got a pretty good handle on how data works in a lot of other cities.
Looking into the future, what do you see for the San Diego Data Library?
Well, San Diego’s an interesting market for this. Our organization is what’s called a data intermediary, and the West Coast has very few. There’s a couple of data projects in Oregon and Seattle. But as far as this kind of an organization that really serves to be a conduit for data, the only group that exists on the West Coast is in Oakland.
But if you go to the East Coast, from D.C. up to Boston there’s dozens and dozens of them. And there’s a bunch around Dallas and Austin and a lot of other cities.
So we knew when we started the organization — I and the board members and the staff that I had working on it — that it was going to be a little different here. There’s a reason why the West Coast doesn’t have these things. And there’s something structurally different. So that’s been a real challenge.
Basically: Where do you find the money? It’s always difficult to fund a pure data project. It’s easier to fund something when you’re trying to solve a problem.
Whatever we come up with in San Diego will be unique to San Diego.
But so far that seems to be coming together. We figured out what the market is. There’s stuff that’s coming together, and the region is really opening up to how data-sharing works, and we’re a key part of that community.
When I talked to the guy from the independent budget analyst’s office [Deputy Director Jeff Kawar], he said that one of the things he was looking into is whether or not [open data] would be beneficial to everyone. He was sort of suggesting that perhaps this was being driven by commercial interests. How do you respond to that?
The commercial interests are one of the really important outputs of this. Not for people to make money. Whether somebody’s making a living off of it or not is not my particular interest. But I think if you look at what the federal government has been doing over the last 50 years, there’s an enormous commercial value to Census data in terms of marketing.
If you’re a business owner and you need that information, getting it from a company that can provide it to you is vastly cheaper and simpler than trying to extract it from the government.
The same thing is true with your GPS in your car, which operates off of GPS data and road data, which is largely public.
That commercial aspect is actually one of the end games. You know you’re successful when companies can take that data and do something useful with it. And those commercial uses in no way exclude the civic use of data.
And I think that having more commercial use creates an ecosystem where the civic use becomes more useful because you have all this other help in getting data.
I’d much rather buy some of the data sets that we use rather than spending weeks trying to fix it.
In our field, with reporting, [the nonprofit group] Investigative Reporters and Editors, they sell licenses to data sets they’ve cleaned up.
There’s a lot of people who are skeptical of how commercial interests and nonprofits work. And that’s one of the reasons our aim on the library is to make it a nonprofit.
Most data intermediaries around the country have a fee-for-service component and a grant component. And we expect that our fee-for-service is going to be most of our funding and the grant component will be very small. That’s just the way that San Diego and the West Coast work.
So we’re going to have to sell data. We’re going to have to sell something.
But it’s very important for me and, I think, for the viability in the long term to have a guarantee that there’s an entity that’s serving a civic goal, that we’ve defined what that civic and social interest for data is.
We’re going to try to make it as a cheap as possible. We’ll give everything away for free if we can.
Compelling arguments have been made for keeping certain kinds of data private. Is there any data you can think of in San Diego County that probably should remain private for compelling reasons?
Oh, there’s lots of it. And that’s true in all of the public data, too. What you get from the Census data is all based on questionnaires and interviews. And the Census has your name and your address on all the questions you answered.
Generally, they release all that data disassembled and reaggregated so you can’t tell who’s who. That level of information, anything that’s personally identifiable, won’t make it out into the public.
What do you think is the most important data or data set that’s not open right now but ought to be?
There’s not really a lot that’s in the city or in the county that has broad-scale important value. With SANGIS [a data library managed by the city and county and hosted by the San Diego Association of Governments], we’ve actually got most of that. You can have complaints about the quality or how often it’s updated. But for the most part, the will to release that exists.
The biggest source of data that we don’t have that’s really socially valuable is in nonprofits. It’s social service data, information about homelessness and mental health. What’s going on at hospitals? What diseases do we have? It’s economic activity. There’s a lot of little things like that that’s really social valuable but it’s not coming out of the city.
And that’s the kind of stuff that I think is really the next frontier of open data.
Clarification: This post has been updated to better reflect SANDAG’s role in SANGIS.