
Memex
NASA JPL . DARPA . Others
A 5 year effort by DARPA for creating Web Crawlers and Search Engines geared towards Govt. and Scientific Research use cases, along with shedding light on the Deep Web. The effort consisted of several big names in research, academics and Industry across the US; NASA JPL, NIST, Stanford, MIT Lincoln Labs, USC ISI, NYU, CMU, Georgetown University, Hyperion Gray, Anaconda etc. It was the first of its kind DARPA effort that open sourced all its code and research: https://github.com/darpa-i2o/memex-program-index.
​
I worked on ideation and implementation of a wide range of novel techniques for this project. Here are some more interesting ones:
-
Added Classification and Reinforcement Learning based focussing capability to open source crawlers for gathering large amounts of data for specific domains/topics.
-
Reinforcement Learning based policy learning for crawlers.
-
Human-in-the-loop training of webpage relevance classification models.
-
Automated API learning.