Page 64 - Technology and Innovation Journal - 19-1
P. 64

402                           DEMPWOLF & SHNEIDERMAN



      unexpected anomalies. Patterns may be as simple as    The second data challenge involves record match-
      seeing how often patents lead to start-up companies  ing and disambiguation across data sources. For
      getting founded or venture capital investments lead  example, this project involves matching data from
      to acquisition of start-up companies, or they may be  U.S. Food and Drug Administration (FDA) approv-
      more complex.                               als, clinical trials, patents, research grants, and other
        Temporal event sequences consist of thousands  sources where EventFlow records correspond to indi-
      or millions of events, which include the record ID  vidual products. While products are named in the
      (company name, ID#, etc.), a timestamp (could be  FDA databases and often in clinical trial data, those
      by the year or day or to the second; e.g., 6-2-25), and  names often do not appear in patent or research grant
      an event category (patent, company launched, initial  data. Federal agencies, including the National Insti-
      public offering, etc.). This information about single  tutes of Health (NIH) and the FDA, have produced
      point events can be assembled into records with a  some ad hoc databases that help with some of this
      dozen or a thousand events (Table 1).       matching—allowing us to present some preliminary
        Temporal event sequences also include interval  results in this paper—but much of this work remains
      events, such as a one-year Small Business Innovation  to be done.
      Research (SBIR) grant or a research project or clinical    Once data has been cleaned and matched, standard
      trial, in which case the event will have a start and an  algorithms for identifying volatile or stable periods
      end timestamp (Table 2).                    in timelines can be used to speed analyses. The com-
        Initial efforts usually focus on cleaning the data,  bination of visual displays and statistical methods
      which often contains incorrect, incomplete, redun-  brings great power to analysts.
      dant, mislabeled, or surprising inputs. Typical errors
      include blank fields, erroneous record ID, misspelled  HOW LONG DOES INNOVATION TAKE?
      event category, incorrect timestamp, or a start date    Innovation trajectories, or the paths or lines of
      that is later than an end date. Visual displays amplify  development that innovation follows, describe the
      human abilities to spot errors such as outliers in a  sequences of innovation activities that translate
      scatterplot, surprising spikes in a timeline, or missing  initial and intermediate inputs into intermediate out-
      links in a network diagram.                 puts and final outcomes. Like physical trajectories,

      Table 1. Sample Single-Point Events

                       Event
        Record ID     Category    Start Date                      Attributes
                                                   docnum="5916595";Organization=
        ALTOPREV      Patent     12/12/1997        "Andrx Pharmaceuticals, Inc"
                                                   docnum="6485748";Organization=
        ALTOPREV      Patent     12/12/1997
                                                   "Andrx Pharmaceuticals, Inc"
                                                   docnum="6080778";Organization="CHILDREN'S
        ALTOPREV      Patent     3/23/1998
                                                   HOSPITAL CORP"
        ALTOPREV    FDA Approval  6/26/2002        docnum="N21316";Organization="COVIS
                                                   PHARMA SARL"
                                                   docnum="7687052";Organization="UNIVERSITY
         AMYVID       Patent     3/26/2007
                                                   OF PENNSYLVANIA"
                                                   docnum="8506929";Organization=“UNIVERSITY
         AMYVID       Patent      8/5/2008         OF PENNSYLVANIA"
                                                   cnum="N202008";Organization="AVID
         AMYVID     FDA Approval  4/6/2012
                                                   RADIOPHARMACEUTICALS"
   59   60   61   62   63   64   65   66   67   68   69