Sunday, May 03, 2009

DLF Aquifer Metadata Working Group "Lessons Learned" report available

That moment when a long-term project comes to an end is always simultaneously filled with relief and sadness. Relief in that new opportunities can be embraced and a pretty package placed around what was accomplished, with appropriate rationales for what didn't make its way into the package. Sadness in that productive and creative working relationships come to a close or change, and that there is always more to be done that cannot for practical reasons be embarked upon at this time.

The Digital Library Federation's Aquifer initiative wrapped up this spring, and causes me to experience that moment of relief and sadness. (Well, to be honest, several moments!) I've been involved with Aquifer from the beginning, and during that time my relationship with it evolved from skepticism to "just jump in and see what you can do" to "bite off one reasonable chunk of a problem and do your best to make this chunk work with other chunks." A report the Metadata Working Group just released, "Advancing the State of the Art in Distributed Digital Libraries: Accomplishments of and Lessons Learned from the Digital Library Federation Aquifer Metadata Working Group," reflects that last approach, attempting to place our work in an ever-evolving context. There is much more that could have been done, and the limitations and benefits of a volunteer committee to do work like this is more evident to me now than ever. Nevertheless, I'm proud of the work this group did. Congratulations to all involved on sucessfully navigating through our many tasks.

The message I sent out about this report to various listservs included the following "thank you":
The Aquifer Metadata Working Group would like to thank all who have been involved with the initiative, including current and past Working Group members; the Aquifer American Social History Online project team; participants in ground-breaking precursor activities such as the DLF/NSDL OAI-PMH Best Practices; individuals and institutions who tested, implemented, and provided feedback on the Metadata Working Group's MODS Guidelines and other work products; and of course DLF for its ongoing support. It's been a wild, educational, and wholly enjoyable ride!
I can't state with enough gratitude the role the community has played in what the Aquifer Metadata Working Group was able to accomplish. I like to talk with those thinking of entering the digital library field just how much of our work is figuring it out as you go - we're constantly refining models to apply to new types of material and take advantage of new technologies. My absolute favorite part about working in this area is navigating the tricky path of effectively building on previous work while pushing the envelope at the same time. I hope the Aquifer Metadata Working Group's contributions continue to be useful as building blocks for a long time to come.

Thursday, March 05, 2009

Must Watch! Michael Edson: "Web Tech Guy and Angry Staff Person"

I heard Michael Edson (Director of Web and New Media Strategy for the Smithsonian) speak at the IMLS WebWise conference last week. He delivered an astonishingly good talk centering around an animation entitled "Web Tech Guy and Angry Staff Person." It's a riot, and the animation sets a lighthearted attitude that reinforces his disclaimer that he's not poking fun or diminishing the very real tensions cultural heritage institutions face as our communication, collection, and even the dreaded B-word (business!) models change underneath us. Instead, I believe it's effective in using exaggeration to highlight some underlying issues and think intelligently about what it takes to say we CAN do something rather than taking the easy road and saying no. We can't just dismiss the challenges - understanding them will help us address them.

Sunday, March 01, 2009

Google vs. Semantic Web

On a number of fronts recently I've been thinking a bunch about RDF, the DCMI Abstract Model, and the Semantic Web, all with an eye towards understanding these things more than I have in the past. I think I've made some progress, although I can't claim to fully grok any of these yet. One thing does occur to me, although it's probably a gross oversimplification. The difference in the Semantic Web/RDF approach from the, say, Google approach is this: is the robustness in the data or is it in the system?

The Semantic Web (et al) would like the data to be self-explanatory, to say itself explicitly what it is it is describing and with explicit reference to all the properties used in the description. The opposite end of the spectrum is systems like Google which assume some kind of intelligence went into the creation of the data but doesn't expect the data itself to explicitly manifest it. The approach of these systems is to reverse engineer that data, getting at the human intelligence that created it in the first place.

The difference is one of who is expected to to the work - the sytem encoding the data in the first place (Semantic Web approach) or the system decoding the data for use in a specific application. Both obviously present challenges, and it's not clear to me at this point which will "win." Maybe the "good enough and a person can go the last bit" approach really is appropriate - no system can be perfect! Or maybe as information systems evolve our standards for the performance of these systems will be raised to a degree where self-describing data is demanded. As a moderate, I guess I think both will probably be necessary for different uses. But which way will the library community go? Can we afford to have feet in both camps into the future?

Sunday, December 21, 2008

Wow.

This poor blog has been sorely neglected lately, and for that I apologize, both to you and to myself. Life has gotten a bit too crazy and I'm still trying to find a way to set some boundaries. But in the middle of several big work deadlines and several personal deadlines (including a 2000 mile road trip starting tomorrow, unexpectedly a day early!), I feel I have to take a minute to comment on this.

lcsh.info is no more.

Wow. I really don't know what to say. There's obviously a story behind this, and I know nothing of it. What I do know is that LC has been promising remote, machine-readable access to their authority files (SKOS is frequently mentioned, and if my memory serves being cited [indignantly] during the leadup to the release of the LC Working Group on the Future of Bibliographic Control as something LC is already working on, so stop harping on it already...) for YEARS now, but such a thing, as Ed notes, has not come to pass. Taken in the context of the recent controversy over the change in OCLC's record use policy, one has to wonder what's up.

I know our library universe is complex. The real world gets in the way of our ideals. (Sure I can share my code! Just let me find some time to clean it up first...) But at some point talk is just talk and action is something else entirely. So where are we with library data? All talk? Or will we take action too? If our leadership seems to be headed in the wrong direction, who is it that will emerge in their place? Does the momentum need to shift, and if so, how will we make this happen? Is this the opportunity for a grass-roots effort? I'm not sure the ones I see out there are really poised to have the effect they really need to have. So what next?

I mean, wow.

Saturday, September 27, 2008

This week's revalation

Too many interesting things going on, too little time to put them into words that others can read...

Something has been stewing in my head for a long time about RDA, and this week I'm at the OLAC/MOUG joint conference where the topic has come up a bit. RDA is supposed to be "made for the digital world." This is something I can completely get behind. But the drafts I've read (and I admit I gave up on them at some point, so maybe this has changed) don't seem to me that they're actually accomplishing that. It's the right goal, but the products I've seen don't meet it. And then it occurred to me: by "for the digital world" I think what the RDA folks actually mean is "catalog digital stuff" rather than "create data that can be used by machines as well as people." I'm interested in the latter, so that's what I was assuming they were interested in. But I'm now wondering if that assumption was false. If we have this problem with terminology for this long within our own profession, how in the world are we going to communicate effectively with others?

Monday, July 07, 2008

I couldn't resist

I'm not one to participate in many blog memes, but seeing all the Wordle clouds out there, I just couldn't resist creating one for FRBR.


Wednesday, May 07, 2008

LC statement on RDA

I've long been on the fence with regards to the development of RDA - is it a transformative event or total folly? I think I've finally come to the opinion that RDA is overall a positive thing, and that it represents a necessary (although of course not perfect) step forward in the ongoing evolution of libraries.

What got me thinking about these issues again was a recent letter from Deanna Marcum at LC explaining why LC was issuing a joint statement with the National Library of Medicine and the National Agricultural Library outlining a testing and decision-making plan for determining whether or not to fully implement RDA. The letter and statement essentially say that wide participation in RDA development is a Good Thing (tm), yet so is substantive evaluation of it. Not much to argue with there. (Well, we always do find something to argue about, don't we?)

The stated goals of RDA, as well as its scope and underlying principles, speak to me strongly. I like the idea of a content standard written with FRBR principles in mind. The goal of making library description interoperate better in the current information environment outside of libraries is of course a laudable one. In this way, just by clearly stating these and a handful of others as the rationale behind the work being done, we've made a significant step forward. We're responding to the world as it exists around us today.

The world is changing, though. The environment today won't be the environment tomorrow. There's no indication, and perhaps even no real hope, that what we decide today will be right in a year, three years, ten. That's a reality we have to face, and I've decided I'm in the camp that says we have to move forward anyways, analyzing the risk but not being afraid of it. Looking at RDA through this lens, will it meet the goals it has outlined? Probably not. I see much in the current drafts that don't demonstrate the overall goals well. But we've never done this before, at least not in this way. We're learning. We're going to make mistakes. The stakes are admittedly high, but they're also high if we don't act. RDA has already evolved from community input, and I suspect it will continue to do so. Maybe it doesn't even stick around that long - maybe we learn enough from writing and trying to implement it that another round is warranted with some key needed improvements. We've investing many resources in this, but that's part of life as well. Many things don't pan out, and that's certainly not unique to the library world. I realize our resources are scarce, but they're going to be zero soon if we don't think creatively. I think RDA is an attempt to do that.

I'm still concerned that RDA as a content standard is stepping too far in the direction of a structure standard for my taste. It's explicitly defining "elements" whereas for content standards I like to think of "classes of elements" to help us remember that instructions in a content standard aren't necessarily a 1:1 match with fields in a data record - this is what enables us to mix and match content and structure standards as we see fit. But I'm the first to admit that the distinction between a structure and a content standard is an artificial one, and that any given standard can blur the line a bit. My concern still lingers, however - the RDA Scope & Structure document uses "elements" and "properties" interchangeably, but I believe these terms, even in the context given here, have very different connotations. We'll see, I suppose, whether my concerns are valid. Maybe I'm just being pedantic about terminology. Or maybe there's a fundamental conceptual problem here. I'm a pragmatist - I realize the only way we're going to find out is to try it.