Researching and analyzing investment opportunities can be challenging for asset management—private equity and hedge fund portfolio managers, researchers, and analysts—because, of course, you want to make sure that you're a good steward of your client's investments.
And when you find and source data, such as traditional or alternative data, you also want to make sure it's reliable and that the methods used to gather it are tried and true.
This article aims to give you an inside look into SESAMm's knowledge graph—one of the key reasons SESAMm's NLP-derived alternative data is reliable and trusted. We'll explain what a knowledge graph is, why it's important, how it works, and what makes SESAMm's knowledge graph unique.
What is a knowledge graph?
A knowledge graph is a digital representation of a network of real-world entities, the foundation of a search engine or question-answering service. This structured data model puts the schema in context through linking and semantic metadata, providing a framework for data integration, analytics, unification, and sharing. In other words, it's like a map and legend, with the legend labeling the concepts, entities, and events and the map connecting and identifying their relationships. These details are stored in a graph database and visualized as a graph representation, hence the term knowledge graph.
Fun fact: The expression, knowledge graph, gained popularity after Google used it in 2012 to name their semantic network.
Two types of knowledge graphs
There are two general types of knowledge graphs: open and private. Open knowledge graphs are open to the public. They're created and made available by organizations such as Wikidata, DBpedia, and Yago. Private knowledge graphs are often only used by organizations that create them, like Google, WolframAlpha, Facebook, and SESAMm (of course). Some offer them up for a fee or subscription, such as Crunchbase and OpenCorporates.
Why a knowledge graph is important
Knowledge graphs are important because they equip us with a model to see how everything relates from a big-picture view, creating new knowledge. Its benefits include:
- Incorporating disparate data sources, avoiding data silos
- Integrating structured and unstructured data
- Revealing insights from hierarchical data
- Outlining relationships
- Defining communities
Knowledge graphs inform machine learning algorithms
From a data science and artificial intelligence (AI) perspective, knowledge graphs provide machine-readable details, adding context and depth to data-driven AI techniques such as machine learning. Using knowledge graphs and machine learning models together improves system accuracy and extends the range of machine learning capabilities for better explainability and trustworthiness.
How a knowledge graph works
The core of a knowledge graph is its knowledge model, a collection of interconnected descriptions of concepts, entities, events, and relationships known as an ontology. This model provides a framework for statements or taxonomy. Each statement consists of a subject, predicate, and object (Figure 1)—known as a triple model—and each subject or object is represented only once in the context of the other subjects and their relationships. For example, in this simple sentence, "The boy kicks the ball," The boy is the subject, and kicker is the predicate because he kicks the ball, the object.
Figure1: Apple is the subject, chief executive officer is the predicate, and Tim Cook is the object.
Likewise, each statement consists of three components: nodes, edges, and labels. A node, or vertice, represents an entity, which can be anything existing in the real world, such as a person, company, or object. For instance, in this example (Figure 2), Barack Obama is the subject node, Malia and Sasha are object nodes, and the edges, or relationships, are labeled as father or sibling, respectively.
Figure 2: How the relationships between nodes can be labeled.
What makes SESAMm's knowledge graph unique?
SESAMm uses open and private datasets with custom, curated information to create our proprietary knowledge graph. As a result, the knowledge graph is a vast map connecting and integrating over 70 million related entities and their keywords, relating each organization to its brands, products, associated executives, names, nicknames, and exchange identifiers in the case of public companies from a data repository made up of more than 18 billion articles and messages and growing.
The knowledge graph is updated regularly
Entities within the knowledge graph are updated weekly and tagged to ensure we correctly track their changes. For instance, the CEO of a company today might not be its CEO tomorrow. And brands might be bought and sold, changing the parent company with each sale. So, weekly updates within the knowledge graph ensure the system is aware of these changes.
NLP-driven accuracy
At SESAMm, named entity disambiguation (NED), a natural language processing (NLP) technique, identifies named entities based on their context and usage. Text referencing "Elon," for example, could refer indirectly to Tesla through its CEO or to a university in North Carolina. Only the context allows us to differentiate, and NED considers that context when classifying entities. This method is superior to simple pattern matching, which limits the number of possible matches, requires frequent manual adjustments, and can't distinguish homophones.
SESAMm uses three other NLP tools to identify entities and create actionable insights: lemmatization, embeddings, and similarity. The lemmatization process normalizes a word into its base form (morphology) to help identify and aggregate entities. Embedding assigns the entity a numerical value to help analyze how words change meaning depending on context and understand the subtle differences between words that refer to the same concept. Similarity measures whether two words, sentences, or objects are close to one another in meaning.
Learn more: “Gain Insights From Financial and ESG Data Using AI: A Comprehensive Guide.”
How SESAMm's knowledge graph benefits you
SESAMm tailored its knowledge graph to find, extract, and analyze data about public or private entities, which isn't readily available from the web or standard rating firms. This unique implementation of a knowledge graph provides insights to give you an edge when researching, analyzing, and submitting recommendations to the portfolio manager or clients.
SESAMm's premiere platform, TextReveal®, allows you to leverage NLP-driven insights fully and receive high-quality results through data streams, modular API and dashboard visualization, and signals and alerts. It's perfect for many quantitative, quantamental, and ESG investment use cases.
Learn how SESAMm can support you in your investment decision-making and request a demo today.