digital health; healthcare; apps; reviews; standards; app attributes; clinical practice; thesaurus; IMART
This article is copyright © 2017 by Springer International Publishing. It is a "pre-print" version, the accepted manuscript of an article to be published in Volume 1 Number 1 of The Journal of Technology in Behavioral Science. The final publication is available at Springer via http://dx.doi.org/10.1007/s41347-016-0005-z
The authors are developing the Interactive Mobile App Review Toolkit (IMART)
to overcome major obstacles that are preventing mobile healthcare apps from becoming part of routine clinical practice. Among such barriers are the challenges of finding a high quality healthcare app among the defective ones flooding the market, the inability of current review methods to address the vast number of apps and other health information technology products being offered [1], the uncertain validity of most app reviews and recommendations [2, 3], and the difficulty in discerning from a review whether an app is appropriate for particular patients and likely to fit into a clinician’s practice approach and workflow [4]. Because efficacy studies are sparse, guidance from a curated collection of expert reviews is crucial in selecting apps to use or recommend along the lines of whatever practice model suits the immediate needs of a client or patient [5].
In addition to the review toolkit itself, the authors are developing a Digital Health Thesaurus
of review standards to provide criteria for assessing products being reviewed. As distinguished from a simple listing of criteria, a thesaurus is a controlled vocabulary
of concepts that avoids semantic ambiguity and aids precision and recall of the concepts, relates the concepts by arranging them into various hierarchies and contains rules for how to use the material, such as for search and for conceptually organized display [6].
Together, IMART and the Thesaurus will enable reviewers to make high quality app reviews readily available in a searchable Digital Health Review Library
(a part of the toolkit) that will spare clinicians from having to forage for reviews of apps that seem acceptable for their clients/patients and for their practices. Well-designed review criteria will help reviewers point to the efficacy to be expected of the apps, their risks, and costs. Such a systematic approach would allow the responsible and informed clinician to try out the most promising candidates before making recommendations to patients.
The pipeline
leading healthcare apps from creation to inclusion in clinical practices is deficient from the start as there are no quality standards for key aspects of the apps. Indeed systematic reviews of apps for particular issues may report finding crippling defects in every app considered [7-9]. Even what initially seem the best commercially available healthcare apps may turn out to be almost unusable by patients [10].
The clinically relevant features of nearly all available healthcare apps are not well grounded in science and the quality of their therapeutic interventions and other features may be poor, but this may not be communicated in reviews. Currently available reviews of mHealth apps have largely focused on personal impressions, rather than evidence-based, unbiased assessments of clinical performance and data security
[3]. On diligent scrutiny many healthcare apps fail reasonable safety criteria [9, 11]. Torous, Chan, Yellowlees, & Borland described how an unregulated free market exists in which many apps are being developed that are of uncertain quality and efficacy
[12].
Many clinicians support the view that therapeutic health interventions should be guided by science, not by art
(as articulated by L’Abate [13], the line between art and charlatanry is very thin indeed
); yet professional organizations alone are unlikely to take concrete steps to limit and prohibit irresponsible and sometimes reprehensible practices
[13]. Although the president of the American Medical Association has severely criticized many existing first generation
apps [14, 15], no professional organization offers guidelines [16]. Governmental regulation of apps also is lacking with respect to apps that are not directly connected to regulated medical equipment [17-19]. The current state of affairs may exist in part because research to indicate how best to regulate digital healthcare in general is lacking [20, 21] and because of hesitation to inhibit progress in the rapidly developing app industry [22, 23]. The result then is succinctly summarized by Wicks & Chiauzzi: there are few centralized gatekeepers between app developers and end-users, no systematic surveillance of harms, and little power for enforcement
[24]. The current authors therefore are attempting to advance the position taken by Pereira-Azevedo et al. [25] that a credible process for certifying apps could improve the safety and quality of apps that are seriously considered by clinicians.
A new healthcare app must be discoverable
to be adopted into clinical practice [26] but well-stocked easy-to-use repositories of such apps do not yet exist. For instance, the search feature in major online app outlets—the iOS App Store (https://itunes.apple.com/us/genre/ios/id36), Google Play (https://play.google.com/store/apps), Windows Phone Store (https://www.microsoft.com/en-us/store/apps/windows-phone/) and BlackBerry World (https://appworld.blackberry.com/webstore/)—retrieve a large and unwieldy set of apps in response to general search terms [27], contributing to a clinician’s healthapp overload
[28, 29] from the over 165,000 health apps publicly available [4]. The descriptions, popularity statistics and user reviews posted for each app retrieved on store
outlets may be all that a clinician takes time to consider, but these data are insufficient to clarify which apps are worth looking into and certainly are not a reliable basis for bringing an app into clinical use [9, 11, 30-32]. Improving search functionality alone has little prospect of incremental benefit for clinicians, for patients, for carers or for the general consumer [33].
The issue of privacy safeguards is insufficiently addressed in search results. Over one-third of the most-downloaded health and fitness apps do not present their privacy policy on their app store listing page, requiring installation of the app and release of personal information to reveal the policy... and 30% of such apps turn out not to have a privacy policy [34]. Many apps offer privacy policies that do not protect privacy, fail to disclose that they share clinical information with outside parties, and do not implement even rudimentary security safeguards [35-38]. Installing certain apps could result in massive clandestine leakage of information about the user [39]. Information acquired from app users, particularly data related to health and fitness, is commercially highly desirable, and selling it is common practice among app purveyors [40, 41]. Because many patients are concerned about their personal health information remaining confidential, when a clinician employs, endorses or is remembered by a patient or client to have tacitly given the nod to an app mentioned by either party during a session, she opens herself to liability issues should the app later be reported to be unsafe [38, 42]. At least this could impair the patient’s trust and weaken the clinician’s effectiveness, especially with patient groups particularly distrustful of hospitals, insurers, physicians and all healthcare components [43].
Before using an app with a client or patient or recommending it, a practitioner should become thoroughly familiar with its features and functionality and decide whether employing the app is likely to result in substantial improvement in a clinical issue that is important to the practitioner or to the patient [44] without burdening clinical workflow [45]. Clinicians cannot personally review each candidate app retrieved from an app store. To narrow their range of options, clinicians need an up-to-date list of apps that have been carefully selected for their quality and that are adequately described. However, of the various websites that have undertaken to offer lists of favored apps, most tend not to be curated by clinical professionals and their recommendations lack clear validity [9]. The exception may be the recently announced RANKED Health approach (http://www.rankedhealth.com/approach/), with its expert consensus reviews of a small but growing number of healthcare apps and its plan to cover digital devices as well.
It remains for expert reviews to help winnow-out poorly constructed and poorly performing apps and to focus attention on the best ones for various purposes, but existing reviews generally are not adequate for a clinician’s discovery and selection of appropriate apps. For one thing, surveys have found no uniformity in defining attributes of apps to be rated, neither for setting criteria for ratings nor for organizing low-level attributes into higher-level categories that are awarded star
ratings [46-60].
Several groups have proposed frameworks for evaluating apps, either as a set of general categories for clinicians themselves to use [16, 56] or as specific rating instruments for reviewers to apply [60]. However, these suggested topics do not take into account the perspectives and real-life challenges faced by service providers
[5]. Table 1 shows examples of app qualities that have been recommended for clinicians to inspect [16, 56], for app developers to address [61], for consumers to consider [62] and as features that are particularly user-centered [63].
Unfortunately, while the above systems seem at first to make enough sense of the issues to be useful to an inquisitive clinician, finding disagreement in star ratings
between reviewers on the broad opinions expressed about such high-level categories as those listed in Table 1 does not amount to understanding the points of disagreement. The categories do not provide for suggestions to clinicians about incorporating an app in clinical care. They are not detailed enough to pinpoint remediable weaknesses and to offer app developers an actionable way to judge improvements in a new version of an app.
For many clinicians, guessing at the value of an app may involve only a cursory examination of the vendor’s product description and of the distribution of consumer ratings as posted at an app store. Even supplementing this impression with an expert’s star ratings leaves the clinician with little guidance for safely installing the app on his own device, familiarizing himself with it and arriving at a judgment about whether to use it with any patients.
On the other hand, if reviewers would work from a standardized and detailed schedule of attributes, reviewed according to explicit criteria, these more concrete assessments could be compared across reviewers. The overall ratings awarded to various general categories would thus be more reliable and clinicians could verify their basis. The current lack of a viable app certification system could be addressed with technology-assisted handcrafted reviews, produced with a tool that facilitates attention to the necessary level of detail, implements research-justified guidelines, and encourages a reviewer to comment in-depth about any special safety concerns or other issues raised by the app [64].
In 2014, the TeleMental Health Institute (http://telehealth.org) recruited a team of experts, originally to improve generation of reviews of mobile behavioral healthcare apps that would be posted on its website for a general audience by extending the MAP/MAP instrument [56]. However, similar approaches, focused on rating just a handful of dimensions, came under criticism as allowing too much opportunity for subjectivity, resulting in low inter-rater reliability of key clinical measures [65].
Encouragingly, major progress towards generation of scientifically verifiable assessments was recently made by the Mobile Application Rating Scale (MARS) [60, 66]. It both clearly defines app attributes and clusters them in a validated and practical instrument that enables more-objective, multidimensional rating and comparison of mobile health apps [60, 66]. Another recent innovation is RANKED Health (http://www.rankedhealth.com/), where clinicians can discover apps reviewed as best in class
by a consensus of expert reviewers and an editor in a process resembling peer review of journal articles.
In response to progress in the field of app reviews, the authors of the current article developed a new framework for producing reviews that would also support development of apps and investigation of which aspects of apps achieve specific outcomes for various problems in various populations [67]. They adopted a detailed, two-pronged approach to fostering reviews. First, reviews would be grounded on a fine-grained and well-defined list of app aspects that can each be assessed objectively and rigorously. Second, the detailed aspects would be assembled into a thesaurus that defines terms unambiguously and groups them into easily understood categories that enable star ratings of app qualities relevant to clinical practice. Review readers could access the detailed scores to confirm those at-a-glance
summaries. Accommodating both detail and generalization is intended to avoid the many problems created by directly scoring high-level categories [68]. The framework would be instantiated as an online system for guiding and facilitating creation of reviews. The system would present its reviews in a curated central library where they could be discovered, aggregated and directly compared with one another along their built-in uniform dimensions. This depository would enable systematic study of the reviews, of the reviewing process and even of the reviewers themselves.
The current authors undertook a comprehensive literature search of English-language papers from 2000 to 2016 to compile a list of app attributes for reviewers to consider. They examined such app review services as offered by the Anxiety and Depression Association of America’s (ADAA) Mental Health Apps website http://www.adaa.org/finding-help/mobile-apps), iMedicalApps ), IMSHealth AppScript https://www.appscript.net/), The American Health Information Management Association (AHIMA)’s MyPHR https://www.myphr.com/Resources/mobile_PHRs.aspx), the National Health Service — UK Health Apps Library http://apps.nhs.uk/), One Mind Institute’s PsyberGuide (http://psyberguide.org/), Social Wellth http://socialwellth.com/) and the curriculum standards developed by the Centers for Disease Control and Prevention [69] as well as the behavior change taxonomy of Abraham, & Michie [70]. They also considered the mERA guidelines for reporting mobile based health interventions [86], the framework for app risk assessment proposed by [71], the Future of Privacy Forum recommendations [34], the RCP Health Informatics Unit checklist [72], the properties influencing aesthetic appraisal of user-interfaces mentioned by [73]; criteria proposed for serious games [74] and the classification of the large medical app database of the University of Texas Health Science Center at Houston [75].
On the basis of this list and their professional judgment, the authors subsequently compiled an extensive set of Attributes
of apps or app reviews to be rated. (See the Glossary below for definitions of bolded terms.) The aspiration for this collection is that Attributes not overlap and that the set be comprehensive, covering all aspects of mobile healthcare apps and reviews that could reasonably be subject to evaluation and reporting. Each Attribute is associated with a brief definition, a rating scale and an article contained in a Digital Health Standards Database
. The authors created a supplementary wiki that holds expanded explanations of the Attributes. Because this wiki can also accommodate invited scholarly discussion about an Attribute as well as user comments, the wiki is named the Digital Health Encyclopedia
.
The authors then added a requirement that the set of Attributes be organized hierarchically as a theory-driven taxonomy in a way that contributes to the reviewer’s perspective on the meaning of each Attribute. Therefore the Attributes were organized into logical Clusters
in a way cognizant of how material has been grouped in published reviews of apps and of health-related websites [46-55, 57-59, 70, 76]. Each Cluster has its rating scale and Digital Health Encyclopedia article. Clusters are further classified into Quadrants
in a theory-based manner, as explained below.
The database of Attributes, their arrangement into a taxonomy and their extended description in the Encyclopedia amount to a thesaurus [6]; hence the name Digital Health Standards Thesaurus.
A simplified mock-up of an IMART Reviewer’s Workbook data entry form for the Learnability Attribute is shown above. The reviewer alters data for both Learnability and its superordinate Usability Cluster. The links at the bottom navigate to a form for a different Attribute within the Usability Cluster. Clicking on a “breadcrumb” link at the top of the form can navigate to a data entry form for a different Cluster. The system offers guidance or editable default text in the Comment boxes.
The Usability Cluster shown in this example is one of the IMART Reviewer’s Workbook Clusters that calls for a “star rating” that could appear in the published review. In the Comment text shown for Usability, clicking on the “accessibility” link opens the associated Digital Health Encyclopedia article. The “USABILITY” and “LEARNABILITY” labels similarly link to the Encyclopedia.
Having established a preliminary thesaurus containing Attributes and Clusters, the authors next designed the IMART Reviewer’s Workbook
, an online instrument that presents a series of interactive forms for scoring and commenting on an app’s Attributes.
A reviewer initializes a blank online Workbook instance by entering information identifying herself and a target app, then storing the new Workbook in the Digital Health Review Library as a private
review available only to the reviewer. She may then bring up an IMART Reviewer’s Workbook data entry form for some Attribute, select a score from its rating scale and enter commentary. Forms such as the one illustrated in Figure 1 are also used to assess Attribute Clusters.
IMART includes a Review Drafting Wizard
in its toolkit that automatically renders the reviewer’s ratings and comments into a compact readable Review Report. This draft acts as feedback
that reflects how the ratings might come across to someone reading the review. The reviewer is free to modify Workbook data to see how that changes the automated draft. The reviewer can edit the draft to create a final version of her Review Report in the Digital Health Review Library, where a moderator may then agree to make the review public
. Library visitors can search among all public reviews produced with the IMART Reviewer’s Workbook and can submit comments that will be available to all visitors. The Library also holds the detailed ratings and comments reviewers have made in their Workbooks and makes these data available to Library visitors as a supplement to the Review Reports.
Thus IMART has three main components: the IMART Reviewer’s Workbook, the Review Drafting Wizard and the Digital Health Review Library. The Digital Health Standards Thesaurus is composed of the Digital Health Standards Database and the Digital Health Encyclopedia (Table 2). IMART depends on the Thesaurus for its review criteria and can supply data based on the experience of reviewers and comments in the Library for upgrading the Standards and Encyclopedia articles.
Having chosen or been assigned an app, the reviewer copies information from the vendor into the Reviewer’s Workbook. She then downloads, installs and starts using the app while noting her observations in the Reviewer’s Workbook. The reviewer cycles among these activities, perhaps turning to already-published reviews, consumer comments and other sources of information. When the reviewer feels she has annotated enough Reviewer’s Workbook Attributes and Clusters, she activates the Workbook’s Review Drafting Wizard and edits its suggested review presentation. The reviewer may then amend her entries or rate additional Attributes. A review may then be submitted to a Digital Health Review Library volunteer moderator for conversion of its status from “private” to “public”.
An instance of the IMART Reviewer’s Workbook accepts and organizes a reviewer’s detailed and systematic observations by drawing upon the Digital Health Standards Database for definitions of Attributes and rating scales. The Review Drafting Wizard of the Reviewer’s Workbook helps the reviewer create a readable presentation. The reviewer could add audio narrative, screenshots of the app and video demonstrations into the final version of her review.
The reviewer’s workflow is illustrated in Figure 2.
The authors anticipate that a theory-based clustering of Attributes in both the Digital Health Encyclopedia and the IMART Reviewer’s Workbook can provide important context that enhances the understanding of reviewers and readers. Assessment of an app on the Cluster level is not simply a matter of estimating the central tendency of its underlying Attribute scores. Instead, a vertically complex
task is involved [77, 78], requiring recursive consideration of Cluster qualities and underlying Attribute qualities and adding information for a Cluster that does not exist on the Attribute level alone. Furthermore, evaluating Attributes and Clusters in tandem (Figure 1.) clarifies matters for the reviewer. Lastly, the hierarchical arrangement of Attributes and Clusters determines how the Review Drafting Wizard compacts raw Reviewer’s Workbook data into a terse and readable Review Report
. This raises the question of what principles to use in devising a taxonomic hierarchy.
Figure 3 diagrams the organization of a single instance of an IMART Reviewer’s Workbook. Each Attribute in the Workbook offers the reviewer its definition and a rating scale to be scored, and solicits the reviewer’s text comments. An associated article in the Digital Health Encyclopedia further clarifies what the Attribute addresses. Attributes are grouped into Clusters. Cluster scores and comments are expected to summarize information entered for their Attributes, but the reviewer is free to create and explain discrepancies. Clusters themselves are classified into Quadrants.
Reported statistical correlations between Attribute ratings may be suggestive, but have little direct application in classifying Attributes. A finding that reviewers’ values for certain app Attributes (or, by analogy, that physicians’ ratings of certain physical signs) tend to covary does not imply a lack of incremental clinical value in reporting them individually. Therefore, the authors’ decisions about how to assign Attributes to Clusters were made heuristically, based on face validity, on clinical experience, on the clustering suggested in the surveyed publications and also in accordance with L’Abate Quadrant Architecture
as described below.
Quadrants, Clusters, Attributes, rating scales and text entry fields in an IMART Reviewer’s Workbook form a hierarchy where each Cluster and Attribute has a unique name, its own rating scale and a text entry field (Figure 3).
The reviewer is invited not only to provide scores and/or descriptions for Attributes, but also to score the Clusters. The Workbook will propose as the default score for a Cluster the median score of its subordinate Attributes. If the reviewer opts to override this default, or if there is a wide variation between the subordinate Attribute ratings, the reviewer is invited to post an explanation in the text entry field for the Cluster. The reviewer may thus recognize an error, or may choose to discuss an outstanding Attribute quality in the final review presentation. For some purposes a clinician who reads the review may consider an app unacceptable if a certain Attribute rating is too low even if its overall Cluster rating is high.
Ultimately, as previously depicted in Figure 1, the reviewer’s opinions about certain of these Clusters resembles most of the bottom line
conclusions that are often expressed as a handful of star ratings
in most existing app reviews. Such compact summaries are particularly influential with consumers [79]. However, in contrast to other review systems, IMART’s top-level general ratings are justified by explicit assessment of their subordinate Attributes. These lower-level opinions are available to the visitor reading a review in the Digital Health Review Library. The reviewer’s comments that accompany the star ratings, together with the evaluations of Attributes that underlie those general conclusions constitute the type of argument
that enhances the credibility and persuasiveness of a review for a knowledgeable audience such as clinicians [80].
IMART groups its Clusters of Attributes in the manner suggested by the late psychology theorist Luciano L’Abate. When Dr. L’Abate joined the authors’ team in 2015, he showed how to conceptualize apps and app reviews according to scientific principles first articulated in 1962 by Thomas Kuhn, who held that theories necessarily rest on paradigms
—conventional, religious and cultural assumptions about the nature of reality that typically are taken for granted and not critically examined [81]. L’Abate and colleagues cautioned in several books that socially constructed paradigms should be regarded as separate from, yet underlying all explanatory theories; and that the paradigm should be recognized and explicitly distinguished when judging a particular theory. Furthermore, to warrant any attention, a theory should consist of models
—cause-and-effect relationships—that can be tested and falsified [82]. L’Abate wrote that the specious interchangeability
of paradigm, theory and model causes confusion that impedes unification of clinical psychology [83]. L’Abate had therefore proposed architectural principles that impose a hierarchic shape
upon the field of psychological theories, permitting more-scientific and less-prejudiced and subjective investigation of the validity of any theory.
L’Abate cautioned the authors of the present article that failure to separate and examine concepts about mobile apps that should occupy different hierarchical levels obscures some of their clinically important features. He posited that viewing apps from the perspective of a hierarchical quadrant
structure would clarify many aspects of apps that have been previously undiscussed and perhaps unrecognized. This view is in line with that of Mendelsohn [84], who took the position that the role of a critic goes beyond telling whether or even how something is good
or bad
. Rather, it is to enlarge the reader’s own critical thinking and perspective, to examine an entity and its genre from outside the box
while yet doing honor to the subject.
In line with L’Abate’s epistemological analysis the authors developed their set of Attributes and Clusters in the Reviewer’s Workbook to conform to the four divisions of what the authors now term the L’Abate Quadrant Architecture.
Simplified hierarchies for three app reviews are diagrammed. The reviews are demarcated on the Identity level of the L’Abate Quadrant Architecture. The review on the left stems from one Basis (such as schema therapy), but the second and third reviews are of apps that arise from another Basis (such as mindfulness). It could be that two reviewers have evaluated the same app, or that the second and third reviews are of different apps. One of the two Outcomes shown for the third identified review is depicted in color. That Outcome is shown as being influenced by three Features of the third app. For clarity, only one Feature is shown for the other Outcome that is described in the third review. Each box contains its own hierarchies of Attributes, their Clusters and their individual ratings and reviewer’s comments as depicted in Figure 3.
The Basis Quadrant
is the root of the hierarchy proposed by L’Abate. From the viewpoint of a patient and clinician, the promise of any healthcare or self-care app is to provide benefit for some problem or aspiration in accord with some professional school of thought or some common and tacit belief about human functioning, particularly about motivation and change in attitudes and behavior. Before inquiring about an app’s safety, effectiveness and efficiency a clinician would ask what the app is supposed to accomplish and along what theoretical lines: Why consider this app in the first place? What is it for? What is its clinical strategy?
Answers to these questions are unavailable in most existing reviews, as they focus on descriptions of shortcomings in their generalized ratings of broad categories. The only actionable outcome of reading a negative review is to avoid using the app. The pivotal difference from such approaches that is offered by the IMART Reviewer’s Workbook is the consideration given to describing and tagging a reviewed app as to its clinical purpose. More specifically, the Reviewer’s Workbook calls for the reviewer to communicate how the goals of the app relate to the theoretical psychological assumptions behind the app’s approach, and how well it cleaves to whichever theoretic principles it claims to follow. The IMART Workbook therefore positions the Clusters and their Attributes relevant to these questions in the root of its hierarchy, the Basis Quadrant, which corresponds to L’Abate’s original Paradigm
level [83].
The following example aims to clarify the distinctions between the hierarchic levels. Suppose that a clinician is seeking an app for a patient who lives with agoraphobia. It would be generally understood by clinicians that an app designed to implement cognitive behavioral therapy (CBT) principles differs from one designed with a mindfulness perspective. In the former CBT case, the user might be asked to document times, situations, thoughts and behaviors, at the very least. In the latter mindfulness case, users may be asked to describe something unusual or emotionally moving in their worlds. The assumption in the CBT app would be that identifying times, situations, thoughts and behaviors would decrease the undesired behavior. In the latter mindfulness app, the underlying assumption would be that increased awareness itself would generalize into reduced rumination and worry. Yet another psychological approach may rely on entertainment and distraction to break up a self-sustaining spell of worrying. The Reviewer’s Workbook would call upon the reviewer to rate and describe such theoretically relevant characteristics of an app in the Basis Quadrant (Figure 4.)
L’Abate’s four-level hierarchy may alternatively be viewed as projected onto a plane to form four “Quadrants” that are positioned along two orthogonal continua, namely Abstraction–Concreteness and Generality–Specificity. The Basis Quadrant contains a reviewer’s descriptions of those app Attributes that concern the app’s purpose, the kind of benefits an app promises to bring and the psychological concept of how this will come about. The Identity Quadrant holds concrete detail identifying which app is under review and who the reviewer is along with notes about the review process itself. The Outcome Attributes are for rating the impact the reviewer expects users of the app to experience and how well the app’s promise is likely to be fulfilled. The reviewer rates and describes the technicalities, as the Features of the app. (After L’Abate [83].)
Because many apps may strive to work according to a particular Basis, the immediately subordinate Identity
level specifies which app is being reviewed. The Clusters situated at the Identity hierarchic level concretely contain the name and version of the app under review as well as information about the developer, the reviewer’s qualifications and the review process (e.g., span of time across which the app was tried). In short, the Basis addresses for what purpose a set of apps might be considered and under which assumptions or psychological theory it is expected to work (why
and how
), while the Identity addresses which
app is under review and by whom (Figure 5).
Subordinate to the Identity level is the Outcome
level, which contains Clusters of Attributes that describe how well
the app keeps its promise to benefit the patient and to fit the clinician’s professional orientation and workflow (Figures 4 and 5).
Attributes at the Feature
level are subordinate to their Outcomes since each Outcome is produced by one or more program features of the app. Features are a matter of with what
the app is composed concretely, its nuts and bolts. Outcomes and Features, being relatively specific may more readily be judged against quality criteria using ordinal scales. On the other hand, the Attributes in the Basis and Identity levels are more general, so that most are rated as present or absent or are simply named or described.
To clarify matters more visually, in addition to their hierarchical arrangement, these four Quadrant levels may be depicted as lying on two axes: abstract–concrete and generality–specificity (Figure 5). Both the hierarchical organization and the two organizing dimensions were suggested by Professor L’Abate as adaptations of his Quadrant
scheme for psychological theories and their paradigms [83].
Listing app Attributes, and their Clusters, arranging them in a hierarchy, imposing further organization in accordance with L’Abate Quadrant Architecture, and defining Attributes in the Digital Health Encyclopedia amounts to a thesaurus [85] with multilingual potential that can be expanded to cover many aspects of digital health beyond apps, websites and sensor apparatus. The authors therefore refer to the combination of the Digital Health Standards Database and the Digital Health Encyclopedia as the Digital Health Thesaurus
.
Rearrangement, reorganization and refinement of the Digital Health Thesaurus made as reviews are produced and data accumulate would be automatically reflected in subsequent versions of the IMART Reviewer’s Workbook, as the Workbook is directly driven by the Digital Health Standards Database. The fine divisions of assessable aspects of apps and of reviews implemented in Digital Health Standards Database and the definitions and discussion contained in the Digital Health Encyclopedia could assist with designing new apps and with judging whether a proposal for an app or other digital health product should receive agency or foundation funding or venture capital investment.
Certification of an ample and diverse set of apps for clinical use and keeping up with the many innovations, though currently impractical [1], could be accomplished with reviews efficiently produced with the IMART online resources. The Digital Health Standards Thesaurus can also support creation of a list of core competencies needed to review apps adequately. Certainly, a reviewer should be aware of relevant individual components of an app, of how they work to produce their effects and of their basic assumptions and should be able to gauge how the an app would be clinically applicable, sufficiently safe for patients and therapeutically beneficial. This suggests a need for training and guidance, where a reviewer becomes able to address every Attribute, even if experience later justifies judicious shortcuts and knowing what to skim past. Reviewers can be supervised and graded as they use IMART Reviewer’s Workbooks to review a benchmark app. The list of competencies can be expanded into a curriculum and a test for reviewers leading to a certificate of completion of training.
Public access to a reviewer’s entries in the Workbook that led to her published review would help readers gain deeper insight into an app and perhaps assist them in making their own assessments of both the review and of the app itself. A clinician who takes a deep dive into a reviewer’s Workbook entries about an app or who opens a new IMART Reviewer’s Workbook and composes his own review is likely to understand the app better and to become able to use the app more effectively with his patients.
IMART will be available to anyone who provides verifiable personal identification. Such registered users could develop their own reviews restricted to personal viewing. Reviews by specifically qualified experts will be considered by moderators for being made public in the Digital Health Review Library. All registered visitors could append user comments to those reviews and to Digital Health Encyclopedia articles. Accumulated testimonials by clinicians would amount to post-marketing experience
and would add to an app’s evidence basis. Such information could help third party payers decide about adding an app to their formulary
in connection with pre-authorization of reimbursement for a clinician’s inclusion of the app in a course of treatment [86].
The IMART framework flexibly accommodates reviews of various kinds of mobile health app and other digital health and welfare products, including those direct-to-consumer products intended for use without professional guidance and reviews of such resources as those designed for access via desktop computers and those that operate independently to monitor and report on symptoms, behaviors and physiological values.
The IMART system is under active development and the authors welcome comment and collaboration from the professional health care, computer science and scientific communities.
App — mobile application, computer program running on a portable device such as a smartphone or pad.
Attribute — aspects of an App or other digital product to be rated in a review.
Basis Quadrant — the collection of Attributes related to the purposes of a digital health product and the theories behind how the goals are to be achieved.
Cluster — a collection of closely related Attributes.
Digital Health Encyclopedia — the wiki containing articles that describe Attributes in depth as well as other articles about digital healthcare and human welfare topics.
Digital Health Review Library — a public database that holds Review Reports and the IMART Reviewer’s Workbook data upon which the reports are based.
Digital Health Standards Database — the electronic storage arrangement of the IMART Reviewer’s Workbook holding definitions of Attributes and their rating scales.
Digital Health Thesaurus — a combination of Attributes and taxonomic arrangements contained in the Digital Health Standards Database together with the expanded definitions, expositions and discussions contained in the Digital Health Encyclopedia, combined with rules for using and displaying such material.
Feature Quadrant — The Quadrant that contains Attributes that concern the software and technical features of a product under review.
Identity Quadrant — The Quadrant that contains Attributes of a product being reviewed that describe what that product is, who has produced it and who is reviewing it.
IMART — The online Interactive Mobile App Review Toolkit system consisting of the Reviewer’s Workbook, the Review Drafting Wizard and the Digital Health Review Library.
IMART Reviewer’s Workbook — The authors’ online resource for generating a review of a digital health product.
L’Abate Quadrant Architecture — A conceptual framework for classifying Attributes into 4 major categories.
Outcome Quadrant — The Quadrant that contains Attributes related to the clinical effects of an app or other product being reviewed.
Quadrant — One of 4 high-level divisions of L’Abate Quadrant Architecture.
Review Drafting Wizard — A feature of the IMART Reviewer’s Workbook that automatically suggests text for a reviewer to use in writing a review.
Review Report — A concise human-readable summary of the data a reviewer has entered into an IMART Reviewer’s Workbook.
Science and Pseudoscience in Clinical Psychology (2nd ed.), Edited by Scott O. Lilienfeld, Steven J. Lynn, and Jeffrey M. Lohr. The American Journal of Family Therapy, 43(2), 210-211. doi:10.1080/01926187.2014.1002365
Trust but verify—Five approaches to ensure safe medical apps. BMC Medicine, 13(1), 205. doi:10.1186/s12916-015-0451-z