Page 64 - Technology and Innovation Journal - 19-1
P. 64
402 DEMPWOLF & SHNEIDERMAN
unexpected anomalies. Patterns may be as simple as The second data challenge involves record match-
seeing how often patents lead to start-up companies ing and disambiguation across data sources. For
getting founded or venture capital investments lead example, this project involves matching data from
to acquisition of start-up companies, or they may be U.S. Food and Drug Administration (FDA) approv-
more complex. als, clinical trials, patents, research grants, and other
Temporal event sequences consist of thousands sources where EventFlow records correspond to indi-
or millions of events, which include the record ID vidual products. While products are named in the
(company name, ID#, etc.), a timestamp (could be FDA databases and often in clinical trial data, those
by the year or day or to the second; e.g., 6-2-25), and names often do not appear in patent or research grant
an event category (patent, company launched, initial data. Federal agencies, including the National Insti-
public offering, etc.). This information about single tutes of Health (NIH) and the FDA, have produced
point events can be assembled into records with a some ad hoc databases that help with some of this
dozen or a thousand events (Table 1). matching—allowing us to present some preliminary
Temporal event sequences also include interval results in this paper—but much of this work remains
events, such as a one-year Small Business Innovation to be done.
Research (SBIR) grant or a research project or clinical Once data has been cleaned and matched, standard
trial, in which case the event will have a start and an algorithms for identifying volatile or stable periods
end timestamp (Table 2). in timelines can be used to speed analyses. The com-
Initial efforts usually focus on cleaning the data, bination of visual displays and statistical methods
which often contains incorrect, incomplete, redun- brings great power to analysts.
dant, mislabeled, or surprising inputs. Typical errors
include blank fields, erroneous record ID, misspelled HOW LONG DOES INNOVATION TAKE?
event category, incorrect timestamp, or a start date Innovation trajectories, or the paths or lines of
that is later than an end date. Visual displays amplify development that innovation follows, describe the
human abilities to spot errors such as outliers in a sequences of innovation activities that translate
scatterplot, surprising spikes in a timeline, or missing initial and intermediate inputs into intermediate out-
links in a network diagram. puts and final outcomes. Like physical trajectories,
Table 1. Sample Single-Point Events
Event
Record ID Category Start Date Attributes
docnum="5916595";Organization=
ALTOPREV Patent 12/12/1997 "Andrx Pharmaceuticals, Inc"
docnum="6485748";Organization=
ALTOPREV Patent 12/12/1997
"Andrx Pharmaceuticals, Inc"
docnum="6080778";Organization="CHILDREN'S
ALTOPREV Patent 3/23/1998
HOSPITAL CORP"
ALTOPREV FDA Approval 6/26/2002 docnum="N21316";Organization="COVIS
PHARMA SARL"
docnum="7687052";Organization="UNIVERSITY
AMYVID Patent 3/26/2007
OF PENNSYLVANIA"
docnum="8506929";Organization=“UNIVERSITY
AMYVID Patent 8/5/2008 OF PENNSYLVANIA"
cnum="N202008";Organization="AVID
AMYVID FDA Approval 4/6/2012
RADIOPHARMACEUTICALS"

