Graph Theory is the mathematical study of the properties and applications of graphs. Graphs are mathematical structures used to model pairwise relations between objects. Graphs are also referred to as networks and contain a set of vertices/nodes/points connected by edges/links/lines.

Graph Theory can be applied to Network Analysis, Link Analysis and Social Network Analysis. These types of analysis borrow notations from Graph Theory and are focused on investigating social structures represented as networks, by applying a variety of mathematical, computational and statistical techniques.

Graph Theory was first introduced and studied in 1736 by Leonhard Euler who was interested in solving the Konigsberg Bridge Problem. Konigsberg was a city in Prussia, Russia with the river Pregel flowing through it creating two islands. The city and islands were connected by seven bridges. The goal of the Konigsberg Bridge Problem was to devise a walk through the city that would cross each of the 7 bridges once and only once with no doubling back, ensuring that you ended where you started.

Leonhard Euler's work with Graph Theory was used to prove that the Konigsberg Bridge Problem was unsolvable. More notable names in the evolution and growth of Graph Theory are A.F Mobius (1840), Gustav Kirchhoff (1845), H.Dudeney (1913) and Heinrich (1969), among others.

One way of utilizing Graph Theory in Network Analysis is by using a powerful python library called NetworkX. NetworkX is a python language software package and an open-source tool for the creation, manipulation and study of the structure, dynamics and functions of complex networks.

The following python code will provide an in depth illustration of how Graph Theory can be used in Network Analysis. It will be applied to Airport Data where the nodes are Airport Abbreviations and the edges represent the Distance between those Airports.

NetworkX is a powerful python library that can be used to experiment with Graph Theory and Network Analysis. It is a strong tool for studying the connections between nodes and can be used to derive statistics that pull out useful information about the structure, distribution and dynamic of a Graph.

IBM Natural Language Understanding (NLU) can be used to analyze the semantic features of text such as the content of web pages, raw HTML and text documents. NLU also has the ability to analyze target phrases in context of the surrounding text for focused sentiment and emotion results.

The semantic features that can be extracted from URLs, raw HTML and text using NLU include:

- Categories
- Concepts
- Emotion
- Entities
- Keywords
- Metadata
- Relations
- Semantic Roles
- Sentiment

The first step would be to sign up for a free IBM Cloud account and log in. Once logged in to IBM Cloud, create an instance of the Natural Language Understanding service. The next important step would be to copy the credentials to authenticate to the service instance you just created. The manage page shows the credentials such as the API Key and the URL values. These values will be needed for writing python code using Natural Language Understanding in IBM Watson Studio.

Once an instance of NLU has been created, the API Key and URL credentials can be used in a jupyter notebook in Watson Studio to write python code that extracts semantic features from web pages, text documents and raw HTML. The following code examples provide a clear illustration of how this can be achieved.

NLU is a very versatile tool. In addition to extracting the above specified semantic features from text, html and web pages, it can also be used with Watson Knowledge Studio (WKS) to extract user defined entities and relations. A model built in WKS can be deployed to NLU to extract Entities and Relations that are related to a certain topic or genre.

Python’s Fuzzywuzzy library contains many methods that can be used to compute a similarity measure for two strings. The Fuzzywuzzy library contains a module called fuzz that contains several methods that can be used to compare two strings and return a value from 0 to 100 as a measure of similarity.

Strings that are completely different would have a similarity score of 0 whereas strings that a completely identical would have a similarity score of 100. Strings with some level of similarity would fall between 0 and 100.

Fuzzywuzzy uses Levenshtein Distance to calculate the differences between sequences. This library also utilises the difflib library under the hood of all of its calculations.

The string comparison methods within the fuzz module of the Fuzzywuzzy library are:

The following code demonstrates how some of these methods work on strings of varying similarity.

Fuzzywuzzy uses Levenshtein Distance to calculate the differences between sequences. This library also utilises the difflib library under the hood of all of its calculations.

The string comparison methods within the fuzz module of the Fuzzywuzzy library are:

- QRatio
- UQRatio
- UWRatio
- WRatio
- partial_ratio
- partial_token_set_ratio
- partial_token_sort_ratio
- ratio
- token_set_ratio
- token_sort_ratio

The following code demonstrates how some of these methods work on strings of varying similarity.

Fuzzywuzzy also contains a module called process which contains methods for determining how similar one string is to a list of other strings.

The methods in the process module are as follows:

The following code demonstrates how some of these methods work.

The methods in the process module are as follows:

- extract
- extractBests
- extractOne
- extractWithoutOrder
- dedupe

The following code demonstrates how some of these methods work.

The Fuzzywuzzy library has many use cases. Imagine trying to determine how similar names, addresses, telephone numbers and dates are between two datasets. This library can be used to find matching records between datasets where a similarity score makes more sense over a definitive True or False regarding equality.

Happy Learning!

Happy Learning!