Ontologies and Semantic Web
Tạm dịch
- Ontology: bản thể học
- Semantic web: web ngữ nghĩa
Ví dụ về hiểu sai thông tin
- Sự kiện 11/9 và Larry Silverstein
- Thuật ngữ Runway Incursion đuợc hiểu như thế nào? tùy ngữ cảnh.
- Airline Operator: Incorrect entering (without clearance) active runway.
- Civil Aviation Authority: Unauthorized entering runway.
Dataset (tập dữ liệu)
- Định nghĩa một tòa nhà (building)
Conceptual models to Ontologies
Ontological Conceptual Modeling
- a way to capture and explain meaning: xác định và giải thích ý nghĩa
- must be understandable to non-experts: dễ hiểu với người không chuyên.
- computable, use the model to infer new knowledge or validate data: có thể suy luận và kiểm tra tính đúng đắn.
Ontologies
are formal specifications of conceptualization: định nghĩa chuẩn cho việc khái niệm hóa
Ontologies and Data integration
SNOMED-CT
Systematized Nomenclature of Medicine - Clinical Terms
- ∼ 300k clinical concepts
- international standard – adopted e.g. in UK, USA, Australia
- uses ontology reasoning to classify/query the concepts
Current Web vs. Semantic Web
SoA – semistructured HTML or XML data. There is vast amount of search engines like Google, Yahoo, MSN, etc. Many of them are invaluable, but as the engines use just keywords and/or some natural language preprocessing methods, the search results contain lots of irrelevant results that need to be processed manually.
How to make web search more efficient ?
- more expressive power for web designers to capture complexities – SW languages (RDF(S), OWL),
- more efficient search engines to handle SW languages – new inference techniques for these languages,
- better search engines interfaces – more expressive query languages
the amount of (unstructured) data is steadily growing
Ontologies and Semantic Web
- ontology has many definitions, but let’s consider it a formal representation of a complex domain knowledge that is shared with others to ensure intelligent system interoperability,
- semantic web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. (cit. Semantic Web. Tim Berners-Lee, James Hendler and Ora Lassila, Scientific American, 2001)
Idea of Semantic Web
- W3C web page - http://www.w3.org/2001/sw
- The data format will be either RDF(S) or OWL,
- Reasoners for RDF(S) can be used for partial derivation in OWL,
- Reasoners for OWL can be used for derivation in RDF(S)
Unique Data Identification – URIs
- URI: unique resource identifier
- URL: a URI that can be resolved to a content using a protocol (e.g. HTTP)
- IRI: International Resource Identifier, is a standard identifier for OWL.
Open World Assumption
A semantic web ⇒ handles incomplete knowledge : xử lý tri thức không toàn vẹn
Open World: cannot be proven ⇒ unknown
Closed World: cannot be proven ⇒ false
Statement : “John is a Man.”
Query: “Is Jack a Man ?”
OWA Answer: “I don’t know.”
CWA Answer: “No.”
Linked Data
Web of Documents - WWW
- webpage: readable by human
- identifiers: IRI
- transfer protocol: HTTP
- unified language: HTML
Web of Data - Linked Data
- webpage: readable by machine
- identifier: IRI
- transfer protocol: HTTP
- unified language: RDF
Linked Data [Heath2011] is a method for publishing structured and interlinked data on the web, building up on URIs, HTTP and RDF technologies.
Document vs. its Content
ensure proper distinction between a document and its content.
- Hash URL: small datasets, hardly grow up
- 303 URIs: suitable for large datasets, good performance.
- 303 URIs are of the form http://id.example.org/people/Alice
- HTTP server sends 303 redirect to the corresponding document of the requested resource.
- HTTP client makes another request, based on Accept
headers, the RDF/HTML version is delivered.
Linked data presentation
LodView, Marmotta, Callimachus, D2R, Pubby, etc.
Open Data
CKAN and DataHub
Open Data Levels
⋆ Available on the web (whatever format) but with an open licence, to be Open Data
⋆⋆ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
⋆ ⋆ ⋆ All the above, plus – Non-proprietary format (e.g. CSV instead of excel)
⋆ ⋆ ⋆⋆ All the above, plus – Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
⋆ ⋆ ⋆ ⋆ ⋆ All the above, plus – Link your data to other people’s data to provide context
Semantic web adopters
U.S. administration: data.gov
Czechia: data.gov.cz