Reference no: EM132890915
Introduction
For this assignment you have to retrieve a number of simple sentences from an existing web page, extract the relevant information from these sentences and transform this information into an RDF knowledge graph with the help of the RDF Mapping Language (RML). Once this RDF knowledge graph is available, you have to display it in Turtle notation. You also have to write a number of SPARQL queries of your choice that extract information from the knowledge graph. Finally, you have to produce a 3-minute video that explains the details of your Python implementation and how you built the RML mapping rules.
Please download the folder "comp3220-assigment-3.zip" to start with this assignment. This folder contains a version of SDM-RDFizer, a configuration file "config.ini" for the RDFizer, an incomplete file "mapping.ttl" for the RML mapping rules to be added, and an HTML file "student.html" that contains the information to be extracted.
Extracting Information
Fetch the following web page ("student.html") from a browser via a HTTP request:
To do this, use the command prompt and go to the folder where the HTML file "student.html" is located and start a simple Python HTTP server from the command line:
C:>python -m http.server 8080
The Python program "student.py" should request the HTML document ("student.html") from the browser via:
https://localhost:8080/student.html
You have to use the Python "requests" module for this task.
Afterwards, use the BeautifulSoup4 library to extract the raw text from the HTML file "student.html" and spaCy to extract the sentences from the text and store these sentences in a list:
['Robert is a person.',
'Robert is a friend of Alice.',
'Robert is born on 1998/07/04.',
'Robert was born in Sydney.',
'Robert is known as Bob.',
'Robert is interested in Artificial Intelligence.',
'Artificial Intelligence was created by John McCarthy.',
"The lecture 'Deep Learning' is about Artificial Intelligence."]
For each of these eight sentences extract the subject (source), the predicate (edge), and the object (target) and store the resulting information in a pandas DataFrame as shown below. Again, you should use spaCy to extract the relevant information from these sentences.
source edge target
0 Robert is a person
1 Robert is a friend of Alice
2 Robert born on 1998/07/04
3 Robert born in Sydney
4 Robert known as Bob
5 Robert is interested in Artificial Intelligence
6 Artificial Intelligence created by John McCarthy
7 lecture 'Deep Learning' is about Artificial Intelligence
Adding RML Mapping Rules
Take the DataFrame and translate the information in this DataFrame into suitable csv files that serve as data sources for the RML mapping document "mapping.ttl". Note that the file "mapping.ttl" initially contains only the prefixes of the IRIs for the N-Triples:
Add RML mapping rules to the file "mapping.ttl" that transform the information in the csv files into N-Triples notation. Once the RML mapping rules are defined, you can launch the transformation in the following way from your Python program:
import os
os.system('python -m rdfizer -c ./config.ini')
If the transformation was successful, then the file "triples.nt" will contain the following N-Triples:
You can visualise the resulting N-Triples as a connected graph. You don't have to generate this graphical representation for this assignment, but the graph may help you to inspect the triples when you develop the mapping rules. I used the online RDF Grapher for this purpose.
4. Displaying N-Triples in Turtle Notation
Use Python's rdflib library, read the file "triples.nt", and display these triples of the knowledge graph in Turtle notation. The output should look as attached:
5. Querying the RDF Knowledge Graph
Add four different SPARQL queries of your choice to the Python code that find the correct answers in the RDF knowledge graph and display these answers in JSON notation. A possible answer may look as follows in JSON:
{"results": {"bindings": [{"name": {"type": "literal", "value": "Artificial Intelligence"}}]}, "head": {"vars": ["name"]}}
6. Producing a Video
Produce a 3-minute video ("student.mp4") that presents your implementation. In this video, you should walk the spectator through the code of your Python program and your RML mapping rules and explain the details of your implementation in your own words. Focus on those parts of the implementation that are novel and haven't already been discussed in the workshops of Week 7 and 8. You can use the free screen recorder FlashBack Express to produce your video.
Attachment:- Assignment specification.rar