tag:blogger.com,1999:blog-5454214143355786692024-03-07T21:37:31.261-08:00Connecting Data on the WebAnonymoushttp://www.blogger.com/profile/14145460496569178403noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-545421414335578669.post-49288249032619552302013-08-12T18:05:00.001-07:002013-08-12T18:05:07.423-07:00Assessing Open Access of Repositories<div itemtype="http://schema.org/BlogPosting">
<p>This blog post is part of an assignment for the <a href="https://p2pu.org/en/courses/5/open-science-an-introduction/" itemtype="http://schema.org/Offer">Open Science course</a> offered by the <a href="https://p2pu.org/en/" itemprop="url" itemtype="http://schema.org/CollegeOrUniversity"><span itemprop="name">P2P University</span></a>. In this week assignment, we were asked to assess the openess of the following 3 repositories:
<ul>
<li>Conservation Biology Institute's (CBI) <a href="http://databasin.org/datasets/" itemprop="url" itemtype="http://schema.org/DataCatalog">Data Basin</a>;</li>
<li><a href="http://www.ccdc.cam.ac.uk/pages/Home.aspx" itemprop="url" itemtype="http://schema.org/DataCatalog">Cambridge Structural Database</a>; and</li>
<li><span itemtype="http://schema.org/GovernmentOrganization" name"name">NASA</span>'s <a href="http://lsda.jsc.nasa.gov/" itemprop="url" itemtype="http://schema.org/DataCatalog">Life Science Data Repositories</a>.</li>
</ul>
</p>
<p>The Data Basin repository provides environmental information, such as physical locations, qualitative and quantitative measurements. Although the website allows non-registered users to search and visualize datasets, it requires an account to contribute to any datasets. The default license for any datasets is the Creative Commons attribution license (i.e. <a href="http://creativecommons.org/licenses/by/3.0/" itemprop="url" itemtype="http://schema.org/TechArticle">CC BY</a>), but any user is able to enforce a less open license. Out of the three investigated datasets, this is the one that meets most of the <span itemprop="keywords">open data</span> paradigms.</p>
<p>The Cambridge Structural Database contains data about small-molecule organic and metal-organic crystal structures. The data in the database is copyrighted by the <a href="http://www.ccdc.cam.ac.uk/pages/Home.aspx" itemtype="http://schema.org/NGO" itemprop="url"><span itemprop="name">Cambridge Crystallographic Data Centre</span></a>. The use of the data is restricted to research and academic and cannot be re-published or used for commercial purpose. The license agreement even stipulates that the data needs to be deleted within 14 days of downloading the data. In other words, the access to data is not very open.</p>
<p>The Life Science Data Repositories provides information about spatial missions and the experiments that took place there. Although the website provides a search tool to see the description of the datasets, the actual data is protected via the <a href="http://www.law.cornell.edu/uscode/text/5/552a" itemtype="http://schema.org/TechArticle" itemprop="url">Privacy Act of 1974 (<span itemprop="citation">5 U.S.C. §552a</span>)</a>. However, users can request access to the data via the <a href="http://www.law.cornell.edu/uscode/text/5/552" itemtype="http://schema.org/TechArticle" itemprop="url">Freedom of Information act (<span itemprop="citation">5 U.S.C. §552</span>)</a>.</p>
</div>
Anonymoushttp://www.blogger.com/profile/14145460496569178403noreply@blogger.com0tag:blogger.com,1999:blog-545421414335578669.post-31393310421081504472013-08-12T16:17:00.003-07:002013-08-12T16:17:31.217-07:00What is Open Data?<div id="main-wrapper" itemtype="http://schema.org/BlogPosting">
<p>The <a href="http://okfn.org/" itemtype="http://schema.org/Organization" itemprop="url">Open Knowledge Foundation</a> summarizes "<i><span itemprop="keywords">Open Data</span></i>" as
<blockquote cite="http://opendefinition.org/#sthash.BzXEFUXo.dpuf"><i>A piece of data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.</i></blockquote>
More specifically, the open data movement aims to make data (e.g. scientific dataset) freely available to people and organizations to re-use and republish as they wish through <a href="http://linked-data.blogspot.com/2013/08/re-using-open-access-content.html" itemtype="http://schema.org/BlogPosting" itemprop="url">Open Access licenses</a>. Note that it recommended to include any additional data generated from analysis, etc. as part of the re-published dataset.
</p>
<p>Over the last few years, government and scientific institutions have published a wide range of datasets. The <a href="http://www.data.gov/" itemtype="http://schema.org/DataCatalog" itemprop="url">data.gov</a> website published a list of 294 sites across 50 countries, including the <span itemtype="http://schema.org/Country" itemprop="name">United States</span> and <span itemtype="http://schema.org/Country" itemprop="name">Belgium</span>, covering domains, such as science, <span itemprop="keywords">government</span> and <span itemprop="keywords">economics</span>. <span itemtype="http://schema.org/Person"><span itemprop="givenName">Ben</span> <span itemprop="familyName">Jones</span></span> has created a <a href="http://dataremixed.com/2013/08/worldwide-open-data-sites/" itemtype="http://schema.org/WebApplication" itemprop="url">tool</a> to easily navigate across the different sites.</p>
</div>Anonymoushttp://www.blogger.com/profile/14145460496569178403noreply@blogger.com0tag:blogger.com,1999:blog-545421414335578669.post-77645867955845424422013-08-05T14:03:00.001-07:002013-08-05T14:03:35.082-07:00Re-using Open Access Content<div id="main-wrapper" itemscope="" itemtype="http://schema.org/BlogPosting">
As part of my work in the <span itemprop="keywords">legal publishing industry</span>, we were asked to gather information about companies (e.g. registered name, address, homepage, and stock exchange ticker symbol). Creating this <span itemprop="keywords">dataset</span> from scratch was such a daunting task that I started investigating the re-use of existing dataset on the Web.<br />
Due to the requirement to represent the data in <a href="http://www.w3.org/TR/rdf-schema/" itemprop="url" itemtype="http://schema.org/TechArticle">RDF</a>, we started looking at open <span itemprop="keywords">linked data</span> repositories (i.e. <a href="http://dbpedia.org/About" itemprop="url" itemtype="http://schema.org/DataCatalog">DBPedia</a>, <a href="https://developers.google.com/freebase/index" itemprop="url" itemtype="http://schema.org/DataCatalog">Freebase</a>, and the <a href="http://data.nytimes.com/" itemprop="url" itemtype="http://schema.org/DataCatalog">New York Times</a>). DBpedia aims at extracting information in <a href="http://en.wikipedia.org/wiki/" itemprop="url" itemtype="http://schema.org/WebPage">Wikipedia</a> as structured data and is available through the CC Attribution-ShareAlike (i.e. CC-BY-SA) license, while Freebase is another repository of structured data and is used by Google to drive its <a href="http://www.google.com/insidesearch/features/search/knowledge.html" itemprop="url" itemtype="http://schema.org/TechArticle">Knowledge Graph</a> feature. Note that content from Freebase is available through the CC Attribution (i.e. CC-BY) license (but cannot be used for commercial use). However, analysis of the content in these repositories showed (i) that the information was not expressed consistently, (ii) that it was often incomplete or (iii) that required information (e.g. stock exchange ticker symbol) was missing.<br />
As a result, we used Google to determine whether they were any more suitable datasets for our problem. Out of the hundreds of repositories being mentioned, we performed an in-depth analysis of the <a href="http://api.corpwatch.org/" itemprop="url" itemtype="http://schema.org/WebApplication">CorpWatch API</a> and the <a href="http://api.corpwatch.org/" itemprop="url" itemtype="http://schema.org/DataCatalog">OpenCorporate repository</a>. The CorpWatch API is funded by the <a href="http://sunlightfoundation.com/" itemprop="url" itemtype="http://schema.org/NGO">Sunlight Foundation</a>. Its dataset is based on the extraction of company information submitting 10-K filings to the Securities and Exchange Commission and is provided as structured data through its API. Although the content set does not fall under any particular license, the copyright holders request that contribution to the data be made public. The OpenCorporate repository contains information for more than 55,000,000 companies across the world and is the most complete in terms of the data available. The content is available through the CC Attribution-ShareAlike (i.e. CC-BY-SA) license for non-commercial use.</div>
Anonymoushttp://www.blogger.com/profile/14145460496569178403noreply@blogger.com0tag:blogger.com,1999:blog-545421414335578669.post-57685016768385497592013-08-05T12:50:00.000-07:002013-08-19T10:25:02.189-07:00Introduction to Open Access<div id="main-wrapper" itemscope="" itemtype="http://schema.org/BlogPosting">
<span itemprop="keywords">Open Access</span> (OA) is the practice of providing unrestricted access and use of content (e.g. research data, academic publications, governmental data) via the <span itemprop="keywords">World Wide Web</span>. For instance, the <a href="http://www.doaj.org/" itemprop="url" itemtype="http://schema.org/WebPage">Directory of Open Access Journals</a> provides free access to 9948 journals covering a wide range of domains, such as Law and Political Science, Computer Science, and Agriculture.<br />
Although the modern OA movement can be traced to mid-60's, its principles can be attributed to <a href="http://en.wikipedia.org/wiki/Paul_Otlet" itemtype="http://schema.org/Person"><span itemprop="givenName">Paul</span> <span itemprop="familyName">Otlet</span></a> (<span itemprop="birthDate">1868</span>-<span itemprop="deathDate">1944</span>), who began the creation of an open repository (called <a href="http://archives.mundaneum.org/en/universal-bibliographic-repertory" itemprop="url" itemtype="http://schema.org/WebPage">Universal Bibliographic Repertory</a>) of facts in 1895. The following year, he developed a mail-based question answering service using the 400,000 facts they had accumulated. Nowadays, the repository contains over 15 million facts and the service is seen as a precursor to <a href="http://www.youtube.com/watch?v=hSyfZkVgasI&noredirect=1" itemprop="keywords" itemtype="http://schema.org/VideoObject">online search</a>.<br />
With the advent of the World Wide Web in the mid 90's, the focus on open access of <span itemprop="keywords">scholarly material</span> has been more prominent. For instance, <a href="http://citeseerx.ist.psu.edu/index" itemprop="url" itemtype="http://schema.org/WebPage">Citeseer</a> launched a website offering a free search service of scientific and academic papers. Although it did not always allow access to a paper, it provided a database of bibliographic information (e.g. citations). In the last few years, the movement has gained even more prominence with many countries defining manifestos to make governmental data publicly available. In June 2013, the G8 published a <a href="https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex" itemprop="url" itemtype="http://schema.org/TechArticle">charter</a> to provide open governmental data to their constituents.<br />
From a legal perspective, the OA movement has been made possible through the expiration of <span itemprop="keywords">copyrights</span> or by copyright holders consenting to make content freely available. The permission to access and re-use content can be expressed via one of the <a href="http://creativecommons.org/" itemprop="url" itemtype="http://schema.org/Organization">Creative Commons</a> licenses. For instance, the attribution license (i.e. <a href="http://creativecommons.org/licenses/by/3.0/" itemprop="url" itemtype="http://schema.org/TechArticle">CC BY</a>) allows third-parties to distribute, remix, tweak, and build upon on someone's work as long as they credit the original source. Through its licensing model, the content made available through open access can be legally shared and re-used. The image below describes the different types of Creative Commons licenses. Note that the image was originally part of an <a href="http://www.masternewmedia.org/how-to-publish-a-book-under-a-creative-commons-license/">article</a> on how to publish a book under Creative Commons license. <a href="http://d3sdoylwcs36el.cloudfront.net/creative-commons-license-types-pros-cons.gif" imageanchor="1" ><img border="0" src="http://d3sdoylwcs36el.cloudfront.net/creative-commons-license-types-pros-cons.gif" /></a></div>
Anonymoushttp://www.blogger.com/profile/14145460496569178403noreply@blogger.com0