Ontologies and Semantic Web

By Vy Tien

Ontologies and Semantic Web

Tạm dịch

  • Ontology: bản thể học
  • Semantic web: web ngữ nghĩa

Ví dụ về hiểu sai thông tin

  • Sự kiện 11/9 và Larry Silverstein
  • Thuật ngữ Runway Incursion đuợc hiểu như thế nào? tùy ngữ cảnh.
    • Airline Operator: Incorrect entering (without clearance) active runway.
    • Civil Aviation Authority: Unauthorized entering runway.

Dataset (tập dữ liệu)

  • Định nghĩa một tòa nhà (building)

Conceptual models to Ontologies

Ontological Conceptual Modeling

  • a way to capture and explain meaning: xác định và giải thích ý nghĩa
  • must be understandable to non-experts: dễ hiểu với người không chuyên.
  • computable, use the model to infer new knowledge or validate data: có thể suy luận và kiểm tra tính đúng đắn.

Ontologies

are formal specifications of conceptualization: định nghĩa chuẩn cho việc khái niệm hóa

Ontologies and Data integration

SNOMED-CT
Systematized Nomenclature of Medicine - Clinical Terms

  • ∼ 300k clinical concepts
  • international standard – adopted e.g. in UK, USA, Australia
  • uses ontology reasoning to classify/query the concepts

ihtsdotools

Current Web vs. Semantic Web

SoA – semistructured HTML or XML data. There is vast amount of search engines like Google, Yahoo, MSN, etc. Many of them are invaluable, but as the engines use just keywords and/or some natural language preprocessing methods, the search results contain lots of irrelevant results that need to be processed manually.

How to make web search more efficient ?

  • more expressive power for web designers to capture complexities – SW languages (RDF(S), OWL),
  • more efficient search engines to handle SW languages – new inference techniques for these languages,
  • better search engines interfaces – more expressive query languages

the amount of (unstructured) data is steadily growing

Ontologies and Semantic Web

  • ontology has many definitions, but let’s consider it a formal representation of a complex domain knowledge that is shared with others to ensure intelligent system interoperability,
  • semantic web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. (cit. Semantic Web. Tim Berners-Lee, James Hendler and Ora Lassila, Scientific American, 2001)

Idea of Semantic Web

  • W3C web page - http://www.w3.org/2001/sw
  • The data format will be either RDF(S) or OWL,
  • Reasoners for RDF(S) can be used for partial derivation in OWL,
  • Reasoners for OWL can be used for derivation in RDF(S)

Unique Data Identification – URIs

  • URI: unique resource identifier
  • URL: a URI that can be resolved to a content using a protocol (e.g. HTTP)
  • IRI: International Resource Identifier, is a standard identifier for OWL.

Open World Assumption

A semantic web ⇒ handles incomplete knowledge : xử lý tri thức không toàn vẹn

Open World: cannot be proven ⇒ unknown
Closed World: cannot be proven ⇒ false

Statement : “John is a Man.”
Query: “Is Jack a Man ?”
OWA Answer: “I don’t know.”
CWA Answer: “No.”

Linked Data

Web of Documents - WWW

  • webpage: readable by human
  • identifiers: IRI
  • transfer protocol: HTTP
  • unified language: HTML

Web of Data - Linked Data

  • webpage: readable by machine
  • identifier: IRI
  • transfer protocol: HTTP
  • unified language: RDF

Linked Data [Heath2011] is a method for publishing structured and interlinked data on the web, building up on URIs, HTTP and RDF technologies.

Document vs. its Content

ensure proper distinction between a document and its content.

  • Hash URL: small datasets, hardly grow up
  • 303 URIs: suitable for large datasets, good performance.
- 303 URIs are of the form http://id.example.org/people/Alice
- HTTP server sends 303 redirect to the corresponding document of the requested resource.
- HTTP client makes another request, based on Accept
headers, the RDF/HTML version is delivered.

Linked data presentation

LodView, Marmotta, Callimachus, D2R, Pubby, etc.

Open Data

CKAN and DataHub

Open Data Levels

enter image description here

⋆ Available on the web (whatever format) but with an open licence, to be Open Data
⋆⋆ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
⋆ ⋆ ⋆ All the above, plus – Non-proprietary format (e.g. CSV instead of excel)
⋆ ⋆ ⋆⋆ All the above, plus – Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
⋆ ⋆ ⋆ ⋆ ⋆ All the above, plus – Link your data to other people’s data to provide context

Semantic web adopters

U.S. administration: data.gov
Czechia: data.gov.cz

Share: Twitter Facebook LinkedIn