ORBIT Dataset

Novel smartphones apps using Artificial Intelligence (A.I.) are really useful in making visual information accessible to people who are blind or low vision. For instance, Seeing A.I. or TapTapSee allow you to take a picture of your surroundings, and then they tell you what things are recognised, for example, “a person sitting on a sofa”. While A.I. recognises objects in a scene if they are common, at the moment these apps can’t tell you which of the things it recognises is yours, and they don’t know about things that are particularly important to users who are blind or low vision.

Using A.I. techniques in computer vision to recognise objects has made great strides, it does not work so well for personalised object recognition. Previous research has started to make some advances to solving the problem by looking at how people who are blind or low vision take pictures, what algorithms could be used to personalise object recognition, and the kinds of data that are best suited for enabling personalised object recognition. However, research is currently held back by the lack of available data, particularly from people who are blind or low vision, to use for training and then evaluating A.I. algorithms for personalised object recognition.

This project, funded by Microsoft A.I. for Accessibility, aims to construct a large dataset by involving blind people.

– The ORBIT (Object Recognition for Blind Image Training) Dataset project page

How do you construct a “large dataset”? And do so with the accessibility credo “with us, not for us”? Well, you need a camera app that people will want to use, that then talks to a dataset collating back-end. In particular, the user-experience challenge of a camera for the blind, and research-ethics grade data infrastructure. That’s what I was brought on to make.

ORBIT Camera: https://github.com/orbit-a11y/ORBIT-Camera
ORBIT Data: https://github.com/orbit-a11y/orbit_data

project | 2020

ORBIT Dataset published

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, Katja Hoffmann

Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. The full dataset contains 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and we mark a subset of 3,822 videos of 486 objects collected by 77 collectors as the benchmark dataset. We propose a user-centric evaluation protocol to evaluate machine learning models for a teachable object recognition task on this benchmark dataset. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-Dataset

Finally, the published dataset: ORBIT Dataset

diary | 31 mar 2021 | tagged: orbit · research

ORBIT open-sourced

To complement the release of the dataset, here’s the infrastructure used to create it – my code open sourced for future dataset collection projects to build on our work. And as our work shows, machine learning needs more inclusive datasets.

https://github.com/orbit-a11y/ORBIT-Camera
https://github.com/orbit-a11y/orbit_data

diary | 31 mar 2021 | tagged: orbit · research · release · code

ORBIT data, archived

Data captured, collated, archived, and now sent to the vault. My role, done.

diary | 20 feb 2021 | tagged: orbit · research

ORBIT Camera v2 → App Store

ORBIT Camera is available on the App Store, worldwide.

We’re worldwide! We are now in Phase Two of the ORBIT Dataset project, with an improved app, easier instructions, and most importantly: working around the world.

diary | 04 nov 2020 | tagged: orbit · release

ORBIT Camera v2

I was brought back to do some tweaks to the app for the second phase of data collection. Motivated to simplify the procedure required of the participants, the big ticket item was to lose the open-ended collection in favour of a fixed number of things, with a fixed number of videos each.

Not quite knowing whether to laugh or cry, this fixed-slot paradigm is what I’d proposed in the first place. Seemed pretty clear to me. Perhaps I could have argued for it better. Anyway, oh my golly did it not only simplify their procedure, but also a great-and-good simplification of the sighted and accessible UX of the app.

diary | 03 nov 2020 | tagged: orbit · code

ORBIT Phase One data

Data collection phase one comes to an end: test and training imagery for 545 things, in the form of 4568 videos. Having built the system with barely a page of test data, it never gets old seeing the paginator having to truncate itself.

diary | 16 jul 2020 | tagged: orbit · research · code

ORBIT Camera → App Store

ORBIT Camera is available on the UK App Store, for iPhone and iPad. And with that, phase one data collection starts. Huzzah!

The app used by blind and low-vision people to collect videos for the ORBIT dataset project – Object Recognition for Blind Image Training.

If you are blind or have low-vision, read on! We are collecting a dataset to help develop AI recognition apps that will use your mobile phone’s camera to find and identify the things that are important to you. For example, has someone moved your house keys? Do you regularly need to identify specific items while out shopping? What about your guide cane – have you forgotten where you put it, or gotten it confused with someone else’s? Maybe you want to recognise the door of a friend’s house? Imagine if you did not have to know exactly where your things were in order to find or identify them again.

To build these recognition apps, a large dataset of videos taken by blind and visually impaired users is needed. As part of the ORBIT dataset project, you will be asked to take multiple videos of at least ten things that are meaningful to you or that you regularly need to find or identify. We will combine these videos with submissions from other ORBIT contributors to form a large dataset of different objects. This dataset can then be used to develop new AI algorithms to help build apps that will work for blind and visually impaired users all over the world.

Not that it was without drama…

Thank you for contacting App Store Review to request an expedited review. We have made a one-time exception and will proceed with an expedited review of ORBIT Camera.

diary | 07 may 2020 | tagged: orbit · release · code

ORBIT Camera, the app

Hot off the digital anvil, an iOS app for blind and low-vision people to collect videos for the ORBIT dataset project. The client for the ORBIT Data server. Written in Swift, using UIKit, UIAccessibility, AVFoundation, GRDB^[1], and Ink.

First-run

On first-run, the user gives research project consent. This is a two-page modal screen, first with the ethics committee approved participant information, then the consent form. The participant information page has a share button ①, which will produce a HTML file. This provides an appropriate and accessible equivalent to providing a physical copy of a participant info hand-out.

On providing consent, the server creates the participant record and supplies credentials that the app will then use to further access the API endpoints. The app does not have any record of the personally identifiable information given as part of consent, or which participant ID it represents, it only has the set of credentials.

This two-screen, step-through, first-run is simpler than my UX pitch. That had the idea of being able to test the app before consent, in order to ’show not tell’. I was in a ‘selling it’ mindset, whereas our participants would already have been recruited to some degree, so the onus was on getting them collecting data as quickly as possible, with the UX quality to keep them there.

Things list screen

The app follows the Master-Detail pattern. The Master screen lists all the Things the user has added. A thing is something that is important to the user, that they would like a future AI to be able to recognise. Things are created with a user-supplied label ③.

Plus an app/project information modal screen, accessed via ②.

A glaring ommission from my UX pitch, fixed here in ④, is tallies for video counts.

Thing record and review screen

This is the detail screen, of the thing-centric Master-Detail pattern. The aim of the app is to capture imagery of a thing, following a certain filming procedure. Visually, this screen presents a carousel of videos organised into the different categories of procedure. For each category, there are the videos already taken, plus a live-camera view as the last entry to add a new video. Information and options relating to the video selected in the carousel appear below, with a camera control overlay when appropriate.

The big change since my UX pitch is losing the fixed slot paradigm. There was a desire not to ‘bake in’ the data collection procedure, so many of the conceptual or UX simplifications have been lost in favour of open-ended collection.

The voiceover experience has a different structure. Here, the user first selects the procedure category Ⓐ, within which they can add a new video to that category Ⓑ or review existing videos for that category Ⓒ.

Plus a filming instructions screen, accessed via ⑤.

UI Notes

For voiceover, the touch targets of the UI elements were often inadequate. The clearest example of this is the close button of the first-run / app info / help screen. Swiping left and right through the long-form text to get to this control was impractical. Plus the button was hard to find, partly because it’s small and top-right, and partly of being swamped by this content. So the accessible experience was re-jigged to have this close control be a strip along the right-hand-side edge. Another clear example is the camera’s start/stop button accessible touch target extends to the screen edges ⑦. This means that most screens actually have an entirely bespoke accessiblity layer.

More gratuitously, the slick UX centered around the carousel took some coding. The pager does a lot more work, being split into filming procedures with the ‘add new’ tailing element for each procedure section ⑥; it got super-positive feedback from testers used to the first iteration with just a plain pager and selection of filming type for every recording. The carousel itself features multiple camera viewfinders, which meant a lower-level approach than AVCaptureVideoPreviewLayer was required.

“because one sometimes enjoys a sharp tool” ↩︎

diary | 03 may 2020 | tagged: orbit · code

ORBIT Data, the server

Also hot off the digital anvil, the data back-end to partner the collection app. Hence ORBIT Data, a web-app to collate and administrate the ORBIT dataset. Comprises REST API for consent and data upload+status from the iOS app, and admin functionality for verification and export of the data. Built with Python using the Django framework.