BlueStream dev notes
From openmichigan
Ali, Emily, Mike and Piet met with Rob and Harmeet of BlueStream to discuss some of the technical questions we had about working with their platform. The goal of the meeting was to do the following:
- discuss the technical capabilities of BlueStream for the use cases of a) BStream as a content repository for both content in processing and published content and b) BStream as a video processing machine;
- learn more about the API(s), what they can handle, and what is pre-built;
- get a better understanding of how this might scale (from a technical point of view).
Contents |
Video repository
Streaming
The original is uploaded, proxyed, and then sent to the streaming server.
Proxying is the term used for transcoding the original high quality version to lower quality formats so that lower fidelity versions can be provided in several formats.
It streams through ITCS streaming service, and our capacity would depend on how many connections and what streaming formats the ITCS infrastructure supports. I did suggest that running out of capacity due to demand for our content is a problem that we would love to have and we are nowhere close to stretching the current infrastructure.
Uploading
BlueStream ingests high quality video (does not do uncompressed HD - Ali rolled his eyes at this. He also suggested that 9GB/min was not excessive disk usage and they should consider supporting it). Essentially you can use their web interface to upload them one at a time (wouldn't want to do this for more than a couple - it's beyond unusable) or you can upload using their 'hot folder' and attach metadata either before of after the upload through an XML file.
The "hot folder" is a CIFS/SMB share (Windows File Sharing protocol) that is mounted on the uploader's desktop. It is periodically checked for new content. This system may not work for OERca users since it is unclear whether the CIFS/SMB share is visible off campus.
Apparently we can write our own batch uploader (and interface) through the API. We can also define what metadata is attached using our own batch uploader. It's currently unclear if it supports custom metadata schema and how they are defined. We will likely have to ask again about that once we figure out a baseline for the initial taxonomies/namespaces we want to use. It is also important that changing not be hard to do, since I doubt we'll get it right in the first go. However, we would need to build our own zip file processor if we wanted to upload compressed assets. The batch upload functionality may require some wrestling since one can't plug in archive unpacking into the current asset processing workflow.
Processing
The video is transcoded from its original format to several other formats (see proxying above). During this process BlueStream also generates keyframes at scene changes and/or time intervals (intervals can be adjusted using the API). It uses Virage Video Logger to index and do the speech to text. If the audio quality is below exceptional, the usefulness of the speech to text reduces to "garbage in - garbage out." It appears that we can get all the metadata created in this processing out of BlueStream through the API. This speech to text also allows indexing of the video file by recognized keyword. It is unclear if this system supports multiple languages.
There is audio processing functionality. The speech to text component and possibly the transcoding can be utilized for audio.
Documents in certain formats (pdf, office?) go through a full text indexing process. Since all the text is processed, narrowing searches to slide titles, or captions etc. is unclear.
Delivery, RSS
After processing, BlueStream apparently allows the user to deliver the content to a variety of places. Appropriately transcoded versions of the files are made available through the ITCS streaming service. Different versions are also made available for download through the . It should support RSS feeds sometime this spring (a couple of months, but Rob wasn't sure about the timing or the functionality).
Document repository
Takes all MS Office docs and Adobe Acrobat docs. All are text indexed but not by page/slide. If you feed it a PPT file, it will create images of each slide for you (our context images). We may be able to extract content objects (embedded images) from the PPT first and then upload the PPT file with those COs as children. BStream would then create the context images for us and we would link everything in our interface. The document decomp functionality is not supported.
Image repository
When images are uploaded, BStream will grab the EXIF headers and has some XMP support (although it seemed finicky about which file types it would accept for XMP metadata). While this is great for file formats that support EXIF, several image file types in OERca e.g. PNG don't support EXIF at all.
Search
The search features sound pretty dumb (as in "not clever"), but we can do a lot on the client side. It searches the full text index and the metadata that are associated with assets. Again, not a lot out of the box here. There is no functionality to narrow search results on the server side. It is unclear if we can do a "qualified" search (searching particular metadata fields) or if we get results from a single index for all fields. Any narrowing of returned search results would be done client side as we see fit to help users refine the results.
Scalability
There appear to be no real technical obstacles to scaling this. There are some questions about "what happens when we get 20,000 unique users that need to log into the system." We're more interested in what happens when 2 million people from all over the world are trying to access our content over the course of a month. The current thinking from BStream is that the ITCS streaming service will break before the Ancept Media Server breaks. ITCS can handle about 3K concurrent streaming sessions - at that point we'd probably see a lot of other things break.
General
API Templates in Action:
the build: AnceptMediaServer | ContentManager | IBM DB2
Parent-Child relationships can be established. So we can associate a Content Object with a material and a material with a course.
ACLs can go down to the child level (e.g. all can have access to the edited version of a PPT that is a child of the original PPT, which might only allow access to a few users). However, ACLs cannot be set to allow access to a particular format of an asset while restricting access to the original.
They once looked into machine readable metadata, but did not get very far. So it doesn't currently support RDF, RDFa, MicroFormats.
If we can define our own workflows within BlueStream, we can use the ContentManager to help us organize.
List of Components We Need to Build
BStream as a content processor only
- Push an object into the repository with metadata
- Pull a specific object out of the repository with metadata
- Push the new metadata and object into publishing repository
- Note: The support for getting data out of BlueStream is not very robust at the moment. The idea is to have data live in there long term, which is likely why Jim Ottaviani et al are looking at separate archival copies in (Deep Blue) and distribution copies of the image files (Blue Stream).
BStream as a content repository
- Push an object into the repository with all associated metadata
- Pull a specific object out of the repository with metadata
- Search the repository with keywords and get a list of results
- Use the handle to the asset to display the image, movie or sound file in our presentation layer (eduCommons). Or provide a link to download the asset in the file directly from BlueStream.