BFO-CCO Office Hours

BFO-CCO Office Hours

Given the growing importance of Basic Formal Ontology and the Common Core Ontologies suite, in defense and intelligence, biology and medicine, as well as service and manufacturing, there is a need for transparency concerning the development and maintenance of these artifacts.

With this in mind, the lead developers of BFO and CCO will hold biweekly "office hours" for stakeholders with questions, concerns, comments, or compliments regarding these standards. These office hours will be stakeholder-led, in that discussion during these one hour sessions will be driven by stakeholders attending the meetings.

Logistics

  • When: Biweekly on Fridays, starting May 31st 2024, 11am - 12pm

  • Where: Virtual Meetings on Teams

If you are interested in joining one or more office hours, please contact John Beverley at johnbeve[@]buffalo.edu. You will be provided a Teams invite for the scheduled time.

In addition to the biweekly office hours, there is an associated Slack channel for the group where stakeholders may continue conversation with the BFO and CCO leads. As above, please contact John Beverley at johnbeve[@]buffalo.edu to be added to the BFO-CCO Office House Slack channel.

As stakeholder questions are addressed, we will also establish an "FAQ" where stakeholders will be directed for vetted answers to commonly posed questions.

Recent NCOR Accepted Work

Members of NCOR in Buffalo have been rather busy submitting work for the 2024 conference season. Here are a few submissions accepted to various conferences (full papers unless otherwise stated):

Convergence of LLMs and Ontologies

The rapid development and deployment of Large-Language Models (LLMs) has led to growing interest in leveraging ontologies and knowledge graphs to enhance LLM capabilities and address limitations. Combining the semantically rich architectures provided by ontologies and knowledge graphs with the generative strengths of LLMs promises to provide a path towards more explainable artificial intelligence systems, more trustworthy output, and a deeper understanding of vulnerabilities arising from integrated architectures.

On July 15th I will be hosting a Workshop on the Convergence of Large Language Models and Ontologies, as part of the 2024 Formal Ontology in Information Systems conference in Enschede, Netherlands. This workshop, associated with a special issue of Applied Ontology, is dedicated to exploring the convergence of knowledge representation and LLM strategies, design patterns, models, and benchmarks. We aim to bring together researchers, practitioners, and enthusiasts from industry, academia, and government in the interest of exploring possible convergence points and advancing each field.

More information as the program develops can be found here.

Drowning in a Rising Tide

Image from Gartner Newsroom

Gartner’s recent survey of nearly 500 chief data and analytics officers (CDAOs) highlights a shortage of skilled staffing for Generative AI (GenAI). Over half of the CDAOs are at least piloting GenAI; it’s accordingly crucial to hire new talent.

Knowledge representation is increasingly important for GenAI. Ontologies and knowledge graphs are particularly vital given how they embed formal relationships across data, helping the organization make better decisions and understand business processes.

I cannot help but worry, however, that the market demand for knowledge representation will continue to outstrip the supply, so that companies will do what they’ve done in the past: hire just anyone who claims to understand knowledge graphs. The concern here is that hiring unqualified individuals will lead to unsatisfying deliverables, which will in turn lead to complaints that knowledge representation is the problem, rather than the lack of talent.

We are training new ontologists here at the University at Buffalo, as quickly as we can. I hope those sympathetic to the promise of semantic interoperability are doing the same.

Enhancing Object-Based Production Conference Part 2

March 21-22, 2024, Tampa FL

Hosted by Celestar

The Enhanced Object-Based Production (EOBP) conference (website and speaker slides here) marked a significant collaborative effort by:

  • Celestar Corporation

  • The National Center for Ontological Research (NCOR)

  • Summit Knowledge Solutions

  • RTX Corporation

  • CUBRC

  • SAIC

  • Maxar Technologies

  • Sensepoint

  • Senior Government representatives.

With approximately 45 participants, the conference aimed to leverage Object Based Production (OBP), ontology, and Referent Tracking methodologies, to enhance intelligence workflows and data management strategies.

The event kicked off with presentations on foundational concepts such as the Basic Formal Ontology (BFO), led by John Beverley, assistant professor at the University at Buffalo and co-director of NCOR, who emphasized the critical role of interoperability and data integration. Barry Smith, professor of the University at Buffalo and co-director of NCOR, explored Referent Tracking, a precise methodology for tracking entities across data sets, which underpins reliable data referencing and interoperability. Jim Tuson discussed the nuances of OBP, which focuses on the systematic handling of objects of interest to streamline intelligence processes.

The afternoon sessions delved deeper into Object Based Intelligence and Production (OBI/OBP) with John Sweet providing insights into how real-world objects are encapsulated within databases to bolster intelligence analysis. Forrest Hare followed with a discussion on the challenges and solutions related to ’track’ and how to track objects in terms of space and time. The keynote by David Limbaugh underscored the potential of enhancing OBP through ontology-based approaches, proposing the adoption of realist ontologies to ensure data model precision and reusability.

Day two commenced with Mark Jensen discussing 'stasis' in the Common Core Ontology (CCO), which facilitates stability in data models amidst change, followed by Erik Thomsen's presentation on the significance of composable and strongly typed ontologies. These sessions highlighted the necessity for robust ontologies that can adapt to complex information challenges, enhancing knowledge management in intelligence operations.

The conference concluded with a discussion on the spatial modeling within an ontological frameworks by John Beverley, and a discussion on government data organization strategies, with notable presentations by Bill Mandrick and Ryan Riccucci. Mandrick's talk suggested the inclusion of new terms to BFO and explored the taxonomy of functions, while Riccucci proposed more efficient data organization methods to alleviate government field action costs.

This gathering not only fostered a deeper understanding of EOBP but also set the stage for continued advancements in integrating ontology with object-based production, aiming for enhanced methodological cohesion and efficiency in intelligence operations. Plans for a follow-up conference are already underway, promising further progress in these critical areas.

Common Core Ontologies Governance Board

The Common Core Ontologies (CCO) [1] have become, over the last decade, an increasingly important resource for the U.S. Government. This mid-level ontology suite is currently used by dozens of organizations and deployed in critical systems in active operation. To date, CCO has been developed and maintained by Ron Rudnicki and the ontology team at CUBRC, Inc., who have overseen successive releases of CCO that have always sought to meet the needs of a growing number of end users. Through the ingenuity and discipline of this team, CCO has remained a touchstone for ontology development, reducing the time needed to develop jointly interoperable high-quality domain ontologies aligned to Basic Formal Ontology (BFO). As a result, CCO has continued to receive greater adoption, and will become a central component of the DoD IC Ontology Foundry. In particular, both BFO and CCO have been directed for use in the communities as baseline standards for ontology development.

In recognition of the need for CCO to continue to scale and evolve, future releases of CCO will be overseen by The Common Core Governance Board, which will have an established charter and bylaws and will be composed of representatives from stake-holder organizations who have been involved in CCO development through the past decade. Initial members will include:

This board shall be charged with ensuring that CCO is openly available, well-maintained, responsive to user needs and technological and theoretical changes, and independent of any undue influence imposed by a single project or organization. Additionally, the board will pursue:

  • funding for maintenance and development of CCO

  • ensuring that CCO is adopted as an IEEE standard mid-level ontology

  • creating a developer’s group for CCO that is empowered but subject to clear oversight

  • organizing conferences, virtual meetings, and so on in service of the CCO community

  • maturing CCO’s release process and associated documentation

  • encouraging academic research and the creation of robust, re-useable domain ontologies under CCO

  • stabilizing CCO to ensure future releases are transparent and mindful of impacts on end users

  • coordinating with The Industrial Ontology Foundry, The Open Biomedical and Biological Ontology Foundry, and The DoD-IC Ontology Foundry

  • ensuring that CCO is responsive to the needs of U.S. Government stakeholders

[1] https://github.com/CommonCoreOntology/CommonCoreOntologies

2024 Enhanced Object-Based Production Conference

The 2024 Enhanced Object-Based Production Conference will be held in Tampa, Florida on March 21-22.

Enhanced Object-Based Production (EOBP) is a data structuring methodology that focuses on organizing information around any "Portion of Reality" (POR). It is based on a solid foundation of ontology and semantics, allowing for structured and meaningful categorization of data.

The event will be hosted by Celestar Corporation, in coordination with the University at Buffalo and the National Center for Ontological Research

Further details about the scheduled talks, hotels, venue, and operations committee can be found here.

Specter of Artificial General Intelligence

In a recent post, Daniel Kelly - UB Director of Administration and Strategy - highlights the work of my colleague Barry Smith and his Senior Research Associate Jobst Landgrebe in their 2023 book titled “Why Machines Will Never Rule the World: Artificial Intelligence Without Fear”. Kelly takes a sobering view of the prospects of general artificial intelligence, observing “There has never been a better time to focus on developing the moral compasses of each and every student so that they are adequately capable of ethically operating and utilizing the increasingly potent tools that will be available to them.”

What goes for our students goes equally well for us old dogs who find themselves learning new tricks.

The Importance of Interoperability in Ontology: Case Study on DBpedia

Author: Carter-Beau Benson

Semantic interoperability streamlines our ability to process and analyze vast data sets and creates a unified and efficient approach to understanding the information. This article sheds light on the significance of interoperability and reveals the potential pitfalls of a crowd-sourced resource like DBpedia, with a particular focus on how sports players are connected to teams across different sports.

DBpedia: Linked Data Powered by Wikipedia

DBpedia is a community-driven project that extracts structured information from Wikipedia and makes it freely available on the web, thereby transforming it into a resource that can be queried and linked to other datasets. DBpedia serves as a semantic layer over Wikipedia, converting human-readable content into a machine-readable format. This semantic layer is constructed using RDF (Resource Description Framework) technology and is queryable using the SPARQL query language. The relationship between Wikipedia and DBpedia is akin to that of a source and its structured reflection; while Wikipedia provides the raw, textual information, DBpedia organizes this information into categories, relationships, and other semantic constructs. This structured data allows for more sophisticated queries and facilitates interoperability with other semantic web technologies.

Unlocking Sports Trivia with Immaculate Grid…Kinda…

Immaculate Grid is a unique sports trivia game from Sports Reference that challenges players to flex their sports knowledge in an interactive way. The game consists of a grid where each square intersects a column and a row, each with its own criteria. Players are tasked with filling in these squares using a specific criteria that pertain to the column and the row, such as a team, an award, or a specific stat. DBpedia, which can be queried using SPARQL, can serve as a powerful resource for players looking to accurately fill in the squares on the Immaculate Grid while also achieving the highest rarity score, offering a data-driven approach to mastering the game.

Well, at least you’d think DBpedia would be a powerful resource for playing Immaculate Grid. There are challenges, however, stemming from the way players from different sports are connected to their respective teams:

  1. Baseball players are linked to teams using the dbp:teams relation, which is then followed by a string literal. For example, Babe Ruth has the object property dbp:teams that connect him to “The Boston Red Sox” and “The New York Yankees”.

  2. Basketball players are linked to teams using the dbp:team relation followed by dbr:Team_Name resource. For example, Magic Johnson has the object property dbp:team that connects him to dbr:Los_Angeles_Lakers.

  3. Hockey players are linked to teams using the dbp:playedFor relation followed by dbr:Team_Name resource. For example, Wayne Gretzky is connected to the teams he made appearances for by the property dbp:playedFor followed by dbr:Edmonton_Oilers; dbr:Los_Angeles_Kings; dbr:New_York_Rangers.

It's evident from these examples that, although the relation essentially conveys the same meaning—that a player plays for a certain team—the way it's represented varies dramatically across sports. This inconsistent representation is not just confusing but makes querying this data cumbersome and time-consuming.

To write a SPARQL query that returns a baseball player that played for both the Atlanta Braves and the Minnesota Twins, we can use the following query:

PREFIX prop: <http://dbpedia.org/property/>
SELECT ?player
WHERE { ?player dbp:teams ?teams.
FILTER(CONTAINS(str(?teams), "Atlanta Braves && CONTAINS(str(?teams), "Minnesota Twins"))}

This SPARQL query is designed to search for baseball players who have played for both the Atlanta Braves and the Minnesota Twins. What's unique about this query is that it's using string literals to search through the teams each player has been associated with. That means it's looking for specific text—'Atlanta Braves' and 'Minnesota Twins'—within the property that lists a player's teams. This is different from searching based on formal database identifiers or 'resources,' which would be the more typical way to make this kind of query. By using the “CONTAINS(str(?teams), "Atlanta Braves") && CONTAINS(str(?teams), "Minnesota Twins")” portion of the query, we can filter results to only include players who have both 'Atlanta Braves' and 'Minnesota Twins' as part of the string that describes the teams they've played for. It's a useful, albeit less precise, way to scrape this information from the database.

Now consider this SPARQL query that returns players that played for the Buffalo Sabres and Montreal Canadiens: 

PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?player ?playerLabel
WHERE { ?player rdf:type dbo:HockeyPlayer ;
dbp:playedFor dbr:Buffalo_Sabres ;
dbp:playedFor dbr:Montreal_Canadiens ;
rdfs:label ?playerLabel.
FILTER(lang(?playerLabel) = "en")}

Unlike the previous query which used string literals to search through teams, this query relies on formal database identifiers known as 'resources' to pinpoint the teams. Let's break it down:

  • ?player rdf:type dbo:HockeyPlayer: States that we are interested in entities that are of the type 'Hockey Player'.

  • ?player dbp:playedFor dbr:Buffalo_Sabres and ?player dbp:playedFor dbr:Montreal_Canadiens: Set the criteria for the teams the player must have played for. Notice how specific team identifiers (dbr:Buffalo_Sabres, dbr:Montreal_Canadiens) are used here, offering more precise results than string literals would.

  • ?player rdfs:label ?playerLabel: Fetches the label (usually the name) of the player.

  • FILTER(lang(?playerLabel) = "en"): Ensures that the label is in English.

By using specific identifiers for both the player type and the teams, the query benefits from the precise, interconnected nature of semantic web data, ensuring a higher level of accuracy in the returned results.

The hockey query demonstrates a more robust and precise approach compared to the baseball query by leveraging the formal identifiers and structured relationships inherent to semantic web data. Specifically, the hockey query uses 'resources' as database identifiers for both the player type and the teams, which allows for more accurate filtering. This enhanced accuracy is especially beneficial when considering historical nuances, such as team relocations or rebrandings. For instance, a query relying on formal identifiers could easily accommodate a player who had played for a team before and after it was relocated, thanks to the interconnected nature of semantic web data. Using the example of the Atlanta Braves, who were once the Milwaukee Braves, the value for dbp:teams as a resource could include historical data linking the two names. Thus, a player who played for both the Milwaukee Braves and the Minnesota Twins would still satisfy the query criteria, assuming the data model appropriately accounts for the Braves' history. This interconnected, historical awareness is something the string-based query would struggle to accommodate without manual adjustments.

The Ideal State of Affairs

In an ideal scenario, not only would the triple that connects a player to a team use a resource in the object place, but the connection between a player and their team would also be standardized across all sports. A unified relation would not only streamline the querying process but also ensure that data from multiple sources can easily be integrated and understood in a consistent manner. The Immaculate Grid sheds light on the necessity of such standardization. To write a SPARQL query that aligns with the varying relations, a querier must first navigate the maze of resources to pinpoint the right object property. Such a process is not only inefficient but also detracts from the primary goal of data analysis and interpretation.

DBpedia's crowd-sourced nature is both its strength and weakness. While it allows for a vast array of data to be collected and presented, it lacks the centralized oversight to ensure ontological consistency. Our example perfectly illustrates this challenge, where the same relation is defined in three separate ways.

Moral of the Story

Interoperability is not just a buzzword—it's an essential cornerstone for effective and efficient data management. Platforms like DBpedia can offer an invaluable resource to the global community, but without standardized ontological structures, their utility is compromised. With standardized relations, we can hope to achieve a seamless data analysis experience, where the primary focus remains on deriving insights rather than grappling with inconsistent terminologies.

Enhanced Object-Based Production Workshop

Enhanced Object Based Production Conference
June 21-22, 2023
SAIC, Arlington, Virginia

The Science Applications International Corporation (SAIC) recently hosted the Enhanced Object Based Production (EOBP) conference in Arlington, Virginia. The conference brought together faculty from the University at Buffalo working in the field of applied ontology, with members of the intelligence community and industry leaders for crucial discussions aimed at improving the methodologies and techniques of intelligence work. The primary objective was to foster the alignment of ontology development strategies with those of the Object Based Production (OBP) community.

The UB Department of Philosophy has for decades been a major player in the creation and dissemination of ontologies – logically well-defined controlled vocabularies of terms and relationships among them representing entities in a given domain – across government, academic, and industry spaces. The department is home to Dr. Barry Smith who is the creator of the Basic Formal Ontology, a top-level ontology architecture used by over 700 ontology projects internationally. Dr. Smith’s was joined by a number of UB-affiliated ontologists, including UB ontology graduate student Carter Benson (UB), who organized the event, and Dr. Werner Ceusters (UB) who is a lead figure in the field of Referent Tracking, an application of ontologies to tracking biomedical health records.

Representing OBP were pioneers of the field – Jim Tucson of SAIC and Geoff X. Davis of Celestar Corporation. OBP is a methodology used in the intelligence community that focuses on specific entities, known as "objects," as the core of information processing and analysis. These objects could be people, places, things, or any identifiable entities that intelligence officers are interested in. OBP offers several benefits. First, it simplifies complex data environments by focusing on the key entities. This makes it easier for intelligence officers to identify patterns, correlations, or unusual activities linked to these entities. Second, OBP allows for more efficient sharing of information, as data linked to specific objects can be readily exchanged between different teams or departments.

The conference name is meant to reflect the unity of applied ontology strategies following the design principles of BFO and Referent Tracking with those of OBP, resulting in Enhanced OBP (EOBP), understood as a sophisticated methodology for structuring data that allows centering of information using any portion of reality in the interest of improving provenance tracking. Consequently, EOBP touches on the questions of document reliability, evidence, and justification for use in decision-making.

Day one began with a deep dive into the foundational concepts of BFO led by Dr. Smith, which was followed by a detailed overview of Referent Tracking by Dr. Ceusters. Highlighting the first day was a keynote by Dr. Forrest Hare (Summit Knowledge Solutions), who shed light on the operational gaps within the intelligence community and how these can be addressed through the deployment of the Defense Intelligence Core Ontology (DICO), which is based on BFO and specific to content used across the defense and intelligence industry.

Day two of the conference saw discussions pivoting towards Provenance Ontology – a widely used ontology for representing tracking and origination of documents - and its mapping into BFO led by UB ontology graduate student, Austin Liebers. Here again Dr. Hare made an appearance by providing an overview of RDF STAR and its relationship to ontology development and EOBP. Subsequently, Dr. Smith presented work developed by Mark Jensen and Dr. David Limbaugh from the Calspan University at Buffalo Research Center, which further emphasized connections ontology representation strategies and EOBP, providing attendees a broader perspective on the subjects.

Attendees were also treated to an insightful presentation by Dr. Ryan Riccucci (Customs and Border Protection) who detailed the history of intelligence workflows from his experience with the Custom and Border Protection, followed by a call for the use of ontologies for addressing encountered challenges. Dr. Riccucci’s comprehensive presentation underscored the importance and potential applications of applied ontology and EOBP in modern intelligence operations.

The EOBP conference marked a substantial step towards wider adoption and implementation of applied ontology strategies combined with traditional object-based production within the intelligence community. The participants left with an in-depth understanding of the theory and practicalities of EOBP and its various aspects of ontology and applications in intelligence operations. Planning has, as a matter of fact, already begun for a follow-up conference to continue progressing on these important issues.

Carter Benson
Research Assistant, Department of Philosophy

Source: ...

Family Feud

One of the perks of being an assistant professor at UB is having the opportunity to listen to Barry Smith give entertaining, thought-provoking, talks. This semester, Barry delivered a talk to my Logic of Ontologies seminar, which is well worth a listen. By way of preface, a bit of back-and-forth:

Question: Name a technology that took the world by storm in 2023.
Answer: ChatGPT
Question: Name a quiz-style, American TV show hosted by Steve Harvey.
Answer: Family Feud
Question: Name a youtube video that combines absurd LLM responses and a persistent line of family feud inspired questions.
Answer:



Erdos Number Now with More Dijkstra!

I recently discovered I have an Erdos Number of 3, i.e. I’ve published with someone who is 2 removed from publishing with Paul Erdos, the “Oddballs oddball.”

The links are as follows:

  • John Beverley coauthored with Yang Wang

  • Yang Wang coauthored with Frank Hsu

  • Frank Hsu coauthored with Paul Erdös

The distribution of Erdös numbers looks like (from “Facts about Erdos Numbers”):

      Erdös number  0  ---      1 person
      Erdös number  1  ---    504 people
      Erdös number  2  ---   6593 people
      Erdös number  3  ---  33605 people +1 John Beverley
      Erdös number  4  ---  83641 people
      Erdös number  5  ---  87760 people
      Erdös number  6  ---  40014 people
      Erdös number  7  ---  11591 people
      Erdös number  8  ---   3146 people
      Erdös number  9  ---    819 people
      Erdös number 10  ---    244 people
      Erdös number 11  ---     68 people
      Erdös number 12  ---     23 people
      Erdös number 13  ---      5 people

I’ve also discovered that I’ve a Dijkstra Number of 4, through the following links:

  • John Beverley coauthored with Gilbert Omenn

  • Gilbert Omenn coauthored with Todd Smith

  • Todd Smith coauthored with Jayadev Misra

  • Jayadev Misra coauthored with Edsger Dijkstra

The above information all being pulled from co-authors.net.

Virus Infectious Disease Ontology

Read a draft of the full paper here.

Abstract: Information emerging from life science research has increasingly been recorded electronically and stored in databases. The sheer volume of data collected by researchers, the speed at which it is generated, range of its sources, quality, accuracy, and need for assessment of usefulness, results in complex, multidimensional, diverse datasets, often annotated in specific terminologies and coding systems by researchers in distinct disciplines. The resulting data silos undermine interoperability, meta-data analysis, reproducibility, pattern identification, and discovery across disciplines. The value of cross-discipline meta-data analysis is, however, evident in the present pandemic. Prostate cancer researchers have leveraged existing research on enzymes crucial in host cell penetration by SARS-CoV-2 to explain differences in disease severity across sex. Immunologists have combined insights from research on SARS-CoV-1 and MERS-CoV with chemical compound profile data, to identify drug and vaccine options for SARS-CoV-2. Pediatric researchers observing that children have fewer nasal epithelia susceptible to SARS-CoV-2 infection than adults, have suggested this difference partially explains symptom disparities between the groups. Researchers across the life sciences are recognizing the pressing need for coordinated data-driven efforts during the current crisis. 

            Shared, interoperable, logically well-defined, controlled vocabularies representing common entities and relations across life science disciplines facilitates data-driven insights across those disciplines. The present need for rapid analysis of evolving datasets representing coronavirus research motivates, moreover, the development of virus, coronavirus, and SARS-CoV-2 specific vocabularies. To these ends, we have developed the Virus Infectious Disease Ontology (VIDO; https://bioportal.bioontology.org/ontologies/VIDO) and the COVID-19 Infectious Disease Ontology (IDO-COVID-19; https://bioportal.bioontology.org/ontologies/IDO-COVID-19). Each is a structured vocabulary, with textual definitions for terms and relations, as well as logical axioms expressed in the OWL 2 Web Ontology Language (https://www.w3.org/TR/owl2-overview/), a World Wide Web Consortium (https://www.w3.org/) language developed for the semantic web. The formal representations of these ontologies support automated consistency checking, querying over relevant datasets, and interoperability with existing data on the semantic web. VIDO is an extension of the widely-used Infectious Disease Ontology Core (IDO Core; https://bioportal.bioontology.org/ontologies/IDO), an ontology comprised of terminological content common to all investigations of infectious disease. VIDO is a refinement of IDO to the specific domain of infectious diseases caused by viruses. As such, VIDO is comprised of common terminological content in investigations of viral diseases, including virus classification, epidemiology, replication, vaccinology, and rational viral drug design. VIDO provides a carefully curated foundation for ontologies representing specific viral infectious diseases such as IDO-COVID-19, an extension of VIDO to the specific disease COVID-19 and its causative virus SARS-CoV-2.