Retrieve a number of simple sentences

Assignment Help Data Structure & Algorithms
Reference no: EM132890915

Introduction

For this assignment you have to retrieve a number of simple sentences from an existing web page, extract the relevant information from these sentences and transform this information into an RDF knowledge graph with the help of the RDF Mapping Language (RML). Once this RDF knowledge graph is available, you have to display it in Turtle notation. You also have to write a number of SPARQL queries of your choice that extract information from the knowledge graph. Finally, you have to produce a 3-minute video that explains the details of your Python implementation and how you built the RML mapping rules.

Please download the folder "comp3220-assigment-3.zip" to start with this assignment. This folder contains a version of SDM-RDFizer, a configuration file "config.ini" for the RDFizer, an incomplete file "mapping.ttl" for the RML mapping rules to be added, and an HTML file "student.html" that contains the information to be extracted.

Extracting Information
Fetch the following web page ("student.html") from a browser via a HTTP request:

To do this, use the command prompt and go to the folder where the HTML file "student.html" is located and start a simple Python HTTP server from the command line:
C:>python -m http.server 8080

The Python program "student.py" should request the HTML document ("student.html") from the browser via:
https://localhost:8080/student.html

You have to use the Python "requests" module for this task.
Afterwards, use the BeautifulSoup4 library to extract the raw text from the HTML file "student.html" and spaCy to extract the sentences from the text and store these sentences in a list:
['Robert is a person.',
'Robert is a friend of Alice.',
'Robert is born on 1998/07/04.',
'Robert was born in Sydney.',
'Robert is known as Bob.',
'Robert is interested in Artificial Intelligence.',
'Artificial Intelligence was created by John McCarthy.',
"The lecture 'Deep Learning' is about Artificial Intelligence."]

For each of these eight sentences extract the subject (source), the predicate (edge), and the object (target) and store the resulting information in a pandas DataFrame as shown below. Again, you should use spaCy to extract the relevant information from these sentences.
source edge target
0 Robert is a person
1 Robert is a friend of Alice
2 Robert born on 1998/07/04
3 Robert born in Sydney
4 Robert known as Bob
5 Robert is interested in Artificial Intelligence
6 Artificial Intelligence created by John McCarthy
7 lecture 'Deep Learning' is about Artificial Intelligence

Adding RML Mapping Rules
Take the DataFrame and translate the information in this DataFrame into suitable csv files that serve as data sources for the RML mapping document "mapping.ttl". Note that the file "mapping.ttl" initially contains only the prefixes of the IRIs for the N-Triples:

Add RML mapping rules to the file "mapping.ttl" that transform the information in the csv files into N-Triples notation. Once the RML mapping rules are defined, you can launch the transformation in the following way from your Python program:
import os
os.system('python -m rdfizer -c ./config.ini')

If the transformation was successful, then the file "triples.nt" will contain the following N-Triples:

You can visualise the resulting N-Triples as a connected graph. You don't have to generate this graphical representation for this assignment, but the graph may help you to inspect the triples when you develop the mapping rules. I used the online RDF Grapher for this purpose.

4. Displaying N-Triples in Turtle Notation
Use Python's rdflib library, read the file "triples.nt", and display these triples of the knowledge graph in Turtle notation. The output should look as attached:

5. Querying the RDF Knowledge Graph

Add four different SPARQL queries of your choice to the Python code that find the correct answers in the RDF knowledge graph and display these answers in JSON notation. A possible answer may look as follows in JSON:
{"results": {"bindings": [{"name": {"type": "literal", "value": "Artificial Intelligence"}}]}, "head": {"vars": ["name"]}}

6. Producing a Video
Produce a 3-minute video ("student.mp4") that presents your implementation. In this video, you should walk the spectator through the code of your Python program and your RML mapping rules and explain the details of your implementation in your own words. Focus on those parts of the implementation that are novel and haven't already been discussed in the workshops of Week 7 and 8. You can use the free screen recorder FlashBack Express to produce your video.

Attachment:- Assignment specification.rar

Reference no: EM132890915

Questions Cloud

Who will trade california water futures : Who will trade California Water Futures? For what purposes? What happens to futures market, such as livestock, corn, cotton, oil futures and commodity futures
Examine marx writings on communism and socialism : Examine Marx's writings on communism and socialism and compare them to how they manifested in reality? What worked and what didn't?
What is the amount of the net deferred tax liability : Shadrach recognized a $3,500 estimated liability for legal expenses in the financial statements during 2019; What is the amount of net deferred tax liability
Standard-traditional costing model : What benefits and drawbacks are there for a business that uses a Standard/Traditional Costing model?
Retrieve a number of simple sentences : Retrieve a number of simple sentences from an existing web page, extract the relevant information from these sentences and transform this information
What additional aspects mangers and leaders would need : Explain the differing roles of nursing leaders and nursing managers in this instance and discuss the different approaches they take to address selected issue
Competitive strategy using theoretical tools learned : Analyze a company's (Netflix) competitive strategy using theoretical tools learned.
How are deferred tax liabilities and assets reported : Question - How are deferred tax liabilities and assets reported on a corporation's balance sheet
Identify american nurses association standards of practice : Conclusion (reflect on the criteria of the assignment). Identify the American Nurses Association Standards of Practice for the licensure you are obtaining.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Write sql select statement that would re-organize results

Describe an algorithm that you could implement that would allow you to find a given person's telephone number in the shortest amount of time

  Explain the purpose of the mod n statement

Explain, in words, the purpose of the mod n statement. Explain what is the purpose of mutex buffer_mutex. What problems solves

  A function call being bound to different functions

Inheritance: Deriving one class from another class so that the new class inherits all the members of the original class.

  Question related to sequential files

In spite of the fact that sequential files lack direct targeted addressing of each of the records and fields, they are the most widely used.

  Calculating an arithmetic mean, median and mode

Calculate an arithmetic mean, median, and mode for up to fifty test scores. The information are contained in a text file. To determine the median, first sort the array.

  Create an array that contains the days of the week

Use the information below to create a pseudocode (which can be a text-based description for solving the problems) and a flowchart.

  Reimplement the tree set class by adding to each node

Reimplement the Tree Set class by adding to each node two links: next and previous, representing the previous and next item that would be obtained.

  Create a sequential search adt

Create a sequential search ADT. The array to be searched is to be maintained by the application program in its own area. The target may be any type and may be included in a structure.

  Design an adt for a two color double stack

Design an adt for a two color double stack adt that consists of two stacks one red and one blue and has its operations color coded versions of the regular stack adt operations.

  Write the pseudocode using if-then-else statements

Write the pseudocode using If-Then-Else statements and create a flowchart with a dual alternative decision structure.

  What does the given program do

Explain how the layout manager might be used for building this GUI - What value is displayed when this program executes? What mathematical function

  Design an algorithm that prompts jason to enter the number

Design an algorithm that prompts Jason to enter the number of items ordered and the price of each item. The algorithm then outputs the total billing amount.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd