Projects - Guha.com

NLWeb

NLWeb is a collection of open protocols and associated open source tools. Its main focus is establishing a foundational layer for the AI Web — much like HTML revolutionized document sharing. To make this vision reality, NLWeb provides practical implementation code—not as the definitive solution, but as proof-of-concept demonstrations showing one possible approach. We expect and encourage the community to develop diverse, innovative implementations that surpass our examples. This mirrors the web's own evolution, from the humble 'htdocs' folder in NCSA's http server to today's massive data center infrastructures—all unified by shared protocols that enable seamless communication. AI has the potential to enhance every web interaction. Realizing this requires a collaborative spirit reminiscent of the Web's early "barn raising" days. Shared protocols, sample implementations, and community participation are all essential. NLWeb brings together protocols, Schema.org formats, and sample code to help sites quickly implement conversational endpoints — benefitting both users through natural interfaces and agents through structured interaction.

Learn more at github.com/NLWeb.ai.

Data Commons

This is an open knowledge repository that combines data from public datasets using mapped common entities. It includes tools to easily explore and analyze data across different datasets without data cleaning or joining.

In addition to a Data Commons about places (demographics, health, crime, economics, etc.), we are also building a Biomedical Data Commons and are starting on an Energy/Climate Data Commons.

I started working on this (during my stint away from Google), collaborating with Andrew Moore and Chaitanya Baru in the context of the Open Knowledge Network effort. I returned to Google November 2017 and started building Data Commons.

Schema.org

Schema.org provides schemas for structured data on the web. It is in use by over 25 million sites and used to power a range of application in search, personal assistants, email, etc.

We started this project in 2010 together with collaborators from Microsoft and Yahoo. It launched in 2011 and shortly after, we were joined by Dan Brickley, who has been co-running Schema.org with me since.

Custom Search

We started Google Custom (or Programmable) search, to explore the idea of a platform where someone could combine their knowledge about a domain to create a better search for that domain, on top of Google's infrastructure, leveraging its web crawl, etc.

RDF & RSS

While at Apple, we created MCF as an attempt to introduce structured data as a first class citizen on the Web. It introduced simple knowledge representation ideas, notably in the form of directed labelled graphs, as a general data model for structured data on the Web. Later, at Netscape, this was submitted to the W3C (MCF Using XML), which eventually evolved into the RDF family of standards, including some which I authored, like RDF Schema.

In 1999, Eckart Walther and I created the first version of RSS as a mechanism for obtaining content for Netscape's portal. It seems to have survived beyond Netscape.

Cyc

I spent my twenties on the Cyc project at MCC. It was an attempt to create a system capable of basic common sense reasoning using the kind of architecture advocated by symbolic logic (McCarthy, Feigenbaum and others). I left the project at the end of 1994 and Doug Lenat carried on with it in Cycorp.

Many ideas from that project flowed into the Semantic Web, the use of Knowledge Graphs in search, etc.