Wu2008:Users Meeting Blog

From CONTENTdm Users Wiki

Jump to: navigation, search

Contents

[edit] Contributor

Stewart Baker, Web Services Librarian at CSU Dominguez Hills.

Infomancy.net, my personal page/blog

[edit] Notes and Presentation Downloads

I took excessive notes which you can download from my personal page in Open Office, MS Word, or plaintext formats if you wish.

PDF versions of all presentations are also available on the CONTENTdm Users Group Meeting website. Just click on the "Agenda" tab to download them.

[edit] Pre-conference

I did not attend either of the two training sessions. If anybody has information about them to share, feel free to add it in.

[edit] Day 1, June 5th

[edit] Session 1: Digital Projects from A-Z

[edit] Claremont University Consortium - Lindley Scrapbook Collection

Collection was a series of scrapbooks which were not indexed very well and needed some preservation attention due to bindings, glues, etc.

Some digitization/presentation issues: The organization and structure of the scrapbooks was varied, with multiple types of items (photos, text, pamphlets, etc.) and items in various orientations on the page. Some items even spread over multiple pages. This meant that decisions had to be made regarding how to scan and present the items in digital form.

Resolution: Each page was scanned and split into the various items contained upon it. These items were then uploaded into Cotentdm with custom thumbnails due to distortion in the automatically-generated thumbnails.

Statistics:

  • Digitization began September 2007
  • One full-time staff member and two students
  • Full-time staff member could scann 22 pages an hour and crop 65 items an hour, and fill out 17 items' metadata per hour.

Project Links:

[edit] Washington State Library & Participating County Libraries - Washington Rural Heritage, A Collaborative Digitization Project

The project involves a number of collections from various small/rural public libraries in the state of Washington, and is organised by the Washington State Library.

Logistics: The CONTENTdm server is hosted by the library, and Acquisition stations are installed in each library. Each library has its own collection on the server, and a "Dark" archive is also available for preservation purposes. Localised CDP Dublin Core used for metadata with Western States for digitization. Initially funded by an LSTA grant with the idea that individual libraries will get granst on their own in the future.

Planning: Surveys were sent to public libraries which fell in the scope of the project. Survey questions included whether or not libraries had historical/special collections, were willing to collaborate, and were willing to sustain the collections on their own.

Statistics:

  • 13 member libraries
  • 1 collection currently live
  • 6 collections currently in progress
  • 6 collections planned

Project Links:

Stevens County Libraries:

  • Kettle Falls Library Historical Collection - Historical photos from the Kettle Falls library, being digitized and catalogued with assistance from the State library. It is hoped that this collection will lend credibility to the library for future projects.
  • Stevens County Heritage Project - Actively gathering historical documents and photos from local sources and digitizing them for preservation and wide access. Some problems involved gathering "metadata" in the form of stories from the providers of the materials and people not wanting their family photographs on the internet for anyone to see. A partnership with a local heritage network added credibility and reduced workload, as the network had previous digitization experience. Face-to-face PR was an important part of gathering materials, especially given the rural setting.

San Jaun Island, Jim Crook Collections:

  • Jim Crook was a "True Island Original", one of the pioneers on the island, living from 1873-1967, and an important part of local history.
  • Initially partnered with a local historical society and town, but partnerships had some issues.
  • Some problems included time-consuming item selection and copyrighting, difficulties locating and identifying some times, and problems with collaboration.

[edit] Archetype Digital Imaging - Large Format Scanning

Overview: Not all items fit on a flatbed scanner, so it's important to know how to scan or photograph items that are larger, too bulky, too fragile, or otherwise not able to be scanned on a flatbed.

Key points:

  • 600 ppi should not be a universal specification - It's okay for smaller items (e.g. a page of a book), but causes problems with larger ones (e.g. a 48"*70" poster).
  • Make sure to get the right camera (or scan-back) for the job.

Options, Choices, and Compromises: PPI of files, Available equipment, production efficiency and speed, versatility, purpose of digitization, staff availability, budget. Of course, the more money you're willing to spend, the better your final product will be.

Resolution: Image resolution is important because it is set at the time of scanning and cannot be increased.

Some Terms:

  • Pixel - Picture Element. Single square block in image
  • Pixel Dimensions - pixels high * pixels wide
  • Megapixel - physical pixels on sensor (Pixels high * pixels wide / 1 million)
  • File Size in MB - Total number of pixels in the file times colour channels (Megapixels * 3 for RGB file)

Project Links:


[edit] Session 2 - Migration & Data Wrangling

[edit] University of Washington - The Promise and Pitfalls of merging collections

A brief caveat: UW is a small digitization effort with no programmers on staff. Some of the problems below could possibly have been resolved in other situations.

Some History leading to the problem:

  • CONTENTdm was actually created at UW by the electrical engineering department in the late 1990s.
  • First collections (in 1998 or so) were from special collections, and new collections in CDm were created identically to the corresponding physical collections.
  • New collections were created in CDm whenever it was deemed necessary.

Anne says UW is a "Poster Child" for poor planning, and implores you not to follow their methods (or lack thereof) when planning digitization efforts.

The Problem:

  • UW now has about 156 different collections in CDm.
  • Some of these collections are extremely similar (e.g. 2 collections of photographs from the same photographer, but with different subject matter).

The "Promises":

  • Merge similar collections into new, bigger, collections.
  • Collections created by students, faculty, or off-campus collaborators would not be touched. However, there were some 58 collections from Special Collections that could be candidates for merging.

The "Pitfalls":

  • Order of field display sometimes important (and different across collections)
  • Non-standardized metadata leads to different names for identical fields; these have to be mapped into a standard field name or included as is.

Benefits and Drawbacks:

  • Benefits - buy time before hitting 200 collection limit; Standardize collections and replan.
  • Drawbacks - URL changes; Different but similar field names cause confusion; field presentation order not easily controlled; lots of work to "wrangle" the data

Final Decision: Due to the nature of changing a Cdm collection, the hyperlink to a merged item would not be the same as it originally was. Because of this, the decision was to not merge existing collections that users may have bookmarked. However, new collections will be merged into existing collections when possible.

Conclusion: UW should “serve as a warning to others... think what will happen when you get big”

Project Links:

[edit] Oklahoma State University - Migrating an IR: Batch loading for newbies

Background:

  • OSU established an Institutional Repository in 2005.
  • Purchased a commercial product and loaded with around 9000 items.
  • IR contained ETDs, Regents Professors, an e-journal, and "University Research"

IR Migration:

  • Change of Vendor means a change in functionality and evaluation of other software.
  • There was a trial of Cdm in the summer of 2007, and it was selected for migration in September.
  • Few of the items in the IR were full-text documents. Most were URLs which were routed through a proxy so that only faculty, staff and students could access them.

The vendor sent the data via FTP. Some of the fields in the data were extraneous for the new system, and so were deleted. This took a lot of time. URL/File name was moved to the last field, which also took some time.

Images were batch-loaded through the Acquisition station's thumbnail manager. URLs were batch-loaded by importing multiple files, adding tab-delimited type, then selecting "URL" for file type when adding into Acquisition Station.

Professors have their own web pages (hand-coded) which link into Cdm.

Project Links:


[edit] Lunch

Involved:

  • sandwiches (okay)
  • potato salad (okay)
  • a piece of fruit (you can't mess up fruit)
  • cookies (disturbingly coloured, but okay)

[edit] Session 3 - Beyond the box: Customizations & Interfaces, Pt. 1

[edit] University of Nevada, Las Vegas – Visualizing map retrieval: searching Contentdm collections with ISIS, an interactive spatial image searching tool

Overview: ISIS (Interactive Spatial Image Search) adds "spatial search" functionality to enhance map collections. Textual search can sometimes be counter-intuitive when searching map collections; ISIS seeks to resolve this issue by adding to Cdm the ability to search by clicking on a map.

Some Shortcomings of textual search:

  • Different place names for the same place
  • Changing place names over time
  • Difference in languages
  • Inconsistency in Cataloguing

Spatial search resolves these issues by leaving language out of the picture.

The ISIS tool gives you a map with a “search box” that you can drag around and re-size to select your search area. Then at the bottom you can choose what type of search to perform (within, at least within, or outside the selected area), which collections to look in, and other miscellaneous preferences.

Technical Aspects:

  • SQL queries with OAI-PMH and an installed Database
  • Uses Scalable Vector Graphics (SVG format)
  • Map sends searches via HTTP GET

ISIS Technical Requirements:

  • PHP5.1 or greater
  • Unix-based OS or Windows (Probably. Windows hasn't been tested)
  • MDB2-compatible DBMS (e.g. MySQL)

ISIS "Know-how" Requirements:

  • SQL knowledge
  • PHP knowledge
  • HTML editing ability for customization
  • Acquiring and Adding Geo-data to the map
  • Converting into and editing SVG format images
  • Ability to convert maps into equi-rectangular projection

Some Issues with ISIS:

  • Map metadata often doesn't include spatial information by default
  • IE needs a plug-in to display SVG formats
  • Other browsers have poor SVG support (though this is improving)
  • Only supports equi-rectangular projection
  • Some know-how required

Project Links:

[edit] Lafayette College – Organizing distributed metadata creation with MetaDB

Overview: MetaDB was created in 2006 to make the task of distributing and editing metadata easier. It was created to add flexibility to the process, so that multiple people could work on different parts of metadata from different parts of campus at the same time. MetaDB is still a work in progress, but is currently in use at Lafayette.

The Warner project is "a photographic record of a US consul's impressions of urban and rural life in Taiwan under Japanese colonial rule. Totaling 340 photographs and postcards gathered by Warner between August 26, 1937 and March 8, 1941". It served as a good "testbed" for MetaDB, since it had a widely dispersed group of metadata creators and editors working on it at the same time.

The Process:

  • Users enter in metadata by way of a HTML form screen
  • Metadata is distributed to the person in charge of verifying it
  • Metadata is exported into a format that Cdm will recognise
  • Metadata is batch-loaded into Cdm via the Acquisition Station

Workflow Benefits:

  • Allows to share digitization tasks with experts across campus and off-site
  • Allows multiple users to work concurrently and at their own pace
  • Allows us to automate where possible to simplify data entry, avoiding errors.
  • Provides helpful interface for editing and storing existing metadata

The Warner project has grown immensely since it was initially placed on the web. An additional 1500 similar items were donated by somebody who saw the project on a blog, and the Warner family has donated the entire collection of his photographic negatives.

Project Links:


[edit] Session 4 - Beyond the Box: Customizations & Interfaces, Pt. 2

I did not attend this session. If anybody has information to add, please add it.

[edit] Day 2, June 6th

[edit] CONTENTdm Update

New with CONTENTdm:

  • JPEG2000 licensing has changed. There are no restrictions, all stations are now JPEG2000, there is no extra charge, and no separate AMA fees. All organizations should have received a letter detailing these changes.
  • Support FAQ launched March 2008 – Provides most commonly received questions and answers. New articles added every month.
  • New CONTENTdm powerpoint plug-in is in the final stages of release and should be out soon.

Currently in development:

New Acquisition Station coming next:

  • More Robust, new features
  • Option to schedule automatic indexing
  • More approve options
  • Expanding Connexion digital import
  • Improved EAD handling.

Long-term goals:

  • Upgrade web interface
  • Expand XML import
  • Improved WorldCat harvesting
  • Streaming media support for hosted users

[edit] Session 5 - Rights Management for Institutional Repositories

[edit] University of Utah - University Scholarly Knowledge Inventory System (U-SKIS)

Overview: U-SKIS was developed at the University of Utah to make loading copyrighted items into the Insitutional Repository easier. It does this by keeping track of which publishers allow what kinds of republishing and other useful information.

Reasoning: When populating their IR, UoU experienced difficulties as most of the items they were loading were not new author submissions, but were items that had already been published in peer-reviewed journals. This lead to problems tracking down the individual publishers and asking them for their policies on republication. It also made workflow distribution difficult.

U-SKIS: U-SKIS resolves these problems by keeping a record of the various publishers' policies (about 400 publishers currently in the database), tracking item records and PDF file locations, and keeping records of communications with publishers.

A demo was given showing how to use the various aspects of U-SKIS. U-SKIS is open source, so anyone can use it for free.

Project Links:

[edit] Claremont University Consortium - Implementing USKIS at Claremont University Consortium

Overview: The Claremont University Consortium installed USKIS on their campus, but had to make some modifications due to differing workflow and environment (technical as well as campus-wide). There were no major problems in making these changes.

caveat: If you are giving a presentation on technology, never say "It's working!" This will instantly cause the overhead projector to shut down for about 10 minutes. At least, that's what happened in this case.

Integration Issues:

  • No "campus"-wide LDAP at Claremont
  • Campus environment totally different than University of Utah
  • Different CONTENTdm management environment
  • Different server structure

Some changes made:

  • User authentification by cookies instead of LDAP
  • PHP instead of Perl
  • Changes to reflect the different campus structure of CuC

Revision and Expansion:

  • New copy/SFTP protocols added in data-transfer.
  • Modified code to support IE7
  • Customized image buttons
  • Shared experience of integrating USKIS with different structure
  • Shared new publisher records/info
  • Re-usable code was useful to develop the senior thesis interface

[edit] Session 6 - Collection Visibility

[edit] Claremont University Consortium - Getting the Word out with metadata harvesters and registries

Overview: This presentation focused on how Claremont advertises its new digital collections.

CCDL (Claremont Colleges Digital Library) uses a number of methods to advertise new collections:

  • Worldcat
  • OAIster
  • Word of Mouth
  • Listserv notices
  • Library announcements
  • Add into library OPAC

Some useful resources for getting your collections "out there":

Useful listservs:

  • Cdm list
  • Digipres List
  • Diglib List
  • Imagelib list

[edit] University of Washington - CONTENTdm, Wikipeda and Flickr: What happens when they play together?

Overview: UW increased access to their collections by putting links to them into relevant Wikipedia pages. This caused some contention in the Wikipedia community, but was eventually accepted (by most). The experiment was a success and drastically increased page hits. UW tried the same thing with Flickr but that didn't work as well.

UW Intention:

  • Not to edit Wikipedia pages
  • Add their digital collections in link form to relevant Wikipedia pages as neutral POV sources

UW Process:

  • Analyze articles to determine the main subject
  • Make sure your link is clearly relevant to the subject (e.g. don't put a link to information about present-day California in a page about the California Gold Rush)

Some speed-bumps in the UW process:

  • UW did not initially have a user account
  • UW added a lot of external links in a short time
  • Both of these factors combined set off red flags with Wikipedia editors watching for spam

Outcome of the UW Wikipeda experiment:

  • Referrals from Wikipedia went up almost 2000 hits in the period of October-July
  • Wikipedia is now the main non-search-engine referrer to UW collections

Some Maintenance Required:

  • Because of Wikipedia's constantly changing nature, you may have to continually check that your links are in the articles where you initially placed them

How to make your Wikipedia links last longer:

  • Make sure your link is clearly relevant to the Wikipedia page
  • Add a succinct (one sentence max) description of your link
  • Sign up for a Wikipedia account. This will give you a "my contributions" and "my watchlist" page that will allow you to see changes made to pages you've edited, and will also make your editing in of links less likely to be flagged as spam.

The Flickr Experiment: This experiment was not so successful, but involved adding images with no copyright issues into Flickr, a social image-sharing network. The images were added with minimal metadata and with a link back to the item in CONTENTdm. UW thinks that this was not as successful because they did not interact with the community as much as they could have.

Flickr Results:

  • 0 referrals from Flickr back to Cdm
  • This is probably due to the nature of Flickr, where users can just look at the pictures on Flickr


Project Links:


[edit] Open Session - Developer's Discussion

I didn't attend this session. If anybody has information, please add it here.