LinkedSpending: OpenSpending becomes Linked Open Data

Abstract

There is a high public demand to increase transparency in government spending. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because it is more likely to commit to large projects. OpenSpending.org is a an open platform that provides public finance data from governments around the world. In this article, we present its RDF conversion LinkedSpending which provides more than five million planned and carried out financial transactions in 627 data sets from all over the world from 2005 to 2035 as Linked Open Data. This data is represented in the RDF Data Cube vocabulary and is freely available and openly licensed.

Keywords

Government transparency finance budget openspending rdf public expenditure Open Data

1. Introduction

A W3C design issue [4] motivates making government data available online as Linked Data for three reasons: “1) Increasing citizen awareness of government functions to enable greater accountability; 2) Contributing valuable information about the world; and 3) Enabling the government, the country, and the world to function more efficiently.” Increasing the transparency of government spending specifically is in high demand from the public. For instance, in the survey publication [13], “Public access to records is crucial to the functioning government” was rated with a mean of 4.14 (1 = disagree completely, 5 = agree completely). Open spending data can reduce corruption by increasing accountability and strengthening democracy because voters can make better informed decisions. Furthermore, an informed and trusting public also strengthens the government itself because it is more likely to commit to large projects (see [1] for details).

Several States and Unions are bound to financial transparency by law, such as the European Union1

¹
“2. The Commission shall make available, in an appropriate and timely manner, information on recipients, as well as the nature and purpose of the measure financed from the budget […]” [14].

with its Financial Transparency System (FTS)2

http://ec.europa.eu/budget/fts

[11]. Public spending services satisfy basic information needs, but in their current form they do not allow queries that go further than simple keyword search or that cannot be answered with data from one system alone. Linked Data solves those problems by providing a universal format, a powerful query language and the possibility of integration with linked data sets from other services.

Our contribution is an RDF transformation of the OpenSpending3

http://openspending.org

project, which provides government spending financial transactions from all over the world and is thus suitable as a core knowledge base that can be enriched and integrated with other, more focused data sets. Transforming OpenSpending to Linked Data and publishing it adds to and profits from the Semantic Web which offers benefits including a standardized interface, easier data integration and complex queries over multiple knowledge bases.

The structure of the paper is as follows. Section 2 motivates the work and presents use cases. Section 3 describes OpenSpending, which is the source of the data, and its statistical data model. Section 4 explains the target RDF Data Cube vocabulary and the transformation process to it. Section 5 describes, how and where the data set is published and in which way users can access the data. Section 6 gives an overall view of the data sets, gives details about the licence used and describes the data sets it is interlinked to. Section 7 presents related spending data sets as Linked Open Data (LOD). The last section discusses known shortcomings of the data sets and future work. The prefixes used throughout this publication are defined in Table 1. In order to save space, prefixes are used even when technically incorrect, such as in ls:berlin_de/model.

Table 1

Namespaces and prefixes used in the paper

prefix	URL
os	http://openspending.org/
owl	http://www.w3.org/2002/07/owl#
ls	http://linkedspending.aksw.org/instance/
lso	http://linkedspending.aksw.org/ontology/
qb	http://purl.org/linked-data/cube#
sdmxd	http://purl.org/linked-data/sdmx/2009/dimension#
dbpedia	http://dbpedia.org/resource/
dbp	http://dbpedia.org/property/

2. Motivation

In a time of globalization, financial data becomes an international network. RDF data with its linked nature supports a representation that takes this network nature into account. As a machine interpretable format, it lowers the access barrier for application developers. For instance, generic Linked Data tools such as OntoWiki, CubeViz and Facete provide end users with the means to explore the data and discover new insights.

Economic analysis LinkedSpending is represented in Linked Open Data, which facilitates data integration. Currencies from DBpedia and countries from LinkedGeoData are already integrated. Financial data offers further integration candidates, such as political or other statistical, policy-influencing data such as health care. This allows queries such as query 7 in paragraph 5, which asks for data sets with currencies whose inflation rates are greater than 10%.

LinkedSpending can also be used to compute economic indicators across several data sets. A possible indicator is a country’s spending on education per person where the population size can be taken from the LinkedGeoData countries linked from one or more budget data sets. One such data set is ugandabudget, which contains the Uganda Budget and Aid to Uganda, 2003–2006. LinkedSpending serves as a hub for the integration of those data sets and their provenance information. More data sets can be integrated with similarity-based interlinking tools such as LIMES [12].

Finding and comparing relevant data sets Government spending amounts are often much higher than the sums ordinary people are used to dealing with but even for policy makers it is hard to understand whether a certain amount of money spent is too high or normal. Comparing data sets and finding those which are similar to another one helps separating common values from outliers which should be further investigated. For example, if another country has a similar budget structure but spends way less on health care with a similar health level, it should be investigated whether that discrepancy is caused by inherent differences such as different minimum wages or a different climate or if it is due to preventable factors such as inefficiencies or corruption. While OpenSpending provides several hundreds of data sets which can be searched and it allows browsing and visualization of any single one, it does not provide a comparison function between data sets. Because of the mechanism to identify equivalent properties (see Section 4), SPARQL queries can compare different data sets, e.g. between similar structures in different countries. Query 9 in paragraph 5 shows a simple query to detect data sets which are most similar to any particular data set. This is done by calculating the number of common measures, attributes and dimensions.

3. OpenSpending source data

OpenSpending4

⁴
http://openspending.org/

is a project which aims to track and analyze public spending worldwide and, at May 2014, contains more than 25 million financial entries in 732 data sets.5

⁵

As some of them contain errors, the number of LinkedSpending data sets is slightly smaller.

Data Sets can be submitted and modified by anyone but they have to pass a sanity check from the OpenSpending Data Team which also cleans the data before publishing.6

⁶

http://community.openspending.org/contribute/data/

OpenSpending hosts transactional as well as budgetary data with a focus on government finance.7

⁷

http://community.openspending.org/help/guide/en/financial-data-types/

It contains this data in structured form stored in database tables and provides searching and filtering as well as visualizations and a JSON REST interface. The data sets differ in granularity and type of accompanying information, but they share the same meta model.

3.1. The data cube model

The domain model of OpenSpending is a data cube (also OLAP cube, hypercube), which represents multi-dimensional statistical observations. Each cell corresponds to an observation (an instance of spending or revenue) that contains measurements (e.g. the amount of money spent or received). The context of the measurement is provided by the dimensions like the purpose, department and time of a spending item and optionally by attributes, which further describe the measured value, e.g., the unit of the measurement.

Figure 1 shows an excerpt from the model of the OpenSpending data set eu-budget with the dimension sub-programme and the measure amount. Figure 2 shows an entry that contains the actual values for the dimension and the measure of the observation.

3.2. Problems

While the data is well-structured and thus suitable for conversion without data cleaning or extensive preprocessing, it still poses problems that need to be taken into account: 1. New data sets are frequently added (approximately 50 per month) and, less often, existing data sets are modified. 2. Some data sets do not specify a value for all properties in all observations. 3. There are properties with the same name in different data sets where it is unknown if they specify the same property. 4. Data Cube is a meta model. The deep structure of the data sets is heterogeneous and described only shallowly. 5. The language of literals is varying between and even within data sets but the language used is not specified. Points 1 to 3 are addressed in the next section while points 4 and 5 are discussed in Section 8.

Fig. 1.

simplified excerpt of an OpenSpending model.

Fig. 2.

simplified excerpt from an OpenSpending entry.

4. Conversion of OpenSpending to RDF

The RDF DataCube vocabulary [5], i.e. an RDF variant of the previously explained data cube model, is an ideal fit for the transformed data.

First and foremost, this vocabulary provides the backbone structure for every LinkedSpending data set, see Fig. 3. Each data set is represented by an instance of qb:DataSet and an associated instance of qb:DataStructureDefinition, which includes component specifications (see Fig. 4 for an example). Each component specification is associated to a component property which can be either a dimension, an attribute or a measure. Commonly used concepts are specified in the model of the Statistical Data and Metadata eXchange (SDMX) initiative.8

⁸
http://sdmx.org

The RDF Data Cube vocabulary is supported by the LOD2 Statistical Office Workbench,9

⁹

http://demo.lod2.eu/lod2statworkbench

which is part of the Linked Data Stack (an advanced version of the LOD2 Stack [3]). The workbench includes a DataCube validator, a split and merge component and a CKAN Publisher. The OntoWiki [2], which manages several parts of the Linked Data Lifecycle [3], such as Storage/Querying and Search/Browsing/Exploration, offers a CSV import plugin for the format as well as a faceted RDF Data Cube browser, CubeViz. Data cubes may contain slices, which are presets for certain dimension values, effectively selecting a subset of a cube. Users may create and visualize their own slices using the OntoWiki CubeViz plugin. Furthermore, the RDF DataCube vocabulary allows the persistence of slices, which is used to represent preconfigured slices from OpenSpending.

Fig. 3.

Used RDF DataCube concepts and their relationships.10

¹⁰

Simplified version of the structure described in [5].

Table 2

Conversion of OpenSpending to LinkedSpending classes and instances

Source URL		JSON Path	LinkedSpending class	LS instance scheme
I	os: name .json		qb:DataSet	ls: name
II	os: name /model		qb:DataStructureDefinition	ls: name /model
III	os: name /model	$.mapping.*	os:{Country,Time}ComponentSpecification or qb:ComponentSpecification	lso: propertyname -spec
IV	os: name /model	$.mapping.*[?(@.type="compound")]	qb:DimensionProperty	lso: propertyname
V		$.mapping.*[?(@.type="date")]	qb:DimensionProperty
VI		$.mapping.*[?(@.type="measure")]	qb:MeasureProperty
VII		$.mapping.*[?(@.type="attribute")]	qb:AttributeProperty
VIII	os: name /entries.json	$.results[*].dataset	qb:Observation	ls:observation-dataset name-hashvalue

Fig. 4.

RDF DataCube vocabulary modelling excerpt of data set berlin_de (some properties and values omitted).

Transformation All of the OpenSpending data sets describe observations referring to a specific point or period in time and thus undergo only minor changes. New data sets, however, are frequently added. Because of this, the huge number of data sets and their size, an automatic, repeatable transformation is required. This is realized by a program11

¹¹

Written in Java, available as open source at https://github.com/AKSW/openspending2rdf.

that fetches a list of data sets on execution and only transforms the ones that are not transformed yet. Each data set is transformed separately. section 2 shows for each class used by LinkedSpending, at which URL (abbreviated using the prefixes from Table 1) the information used to create the instances of those classes is found. In case there are multiple instances described at one URL, a JSON path12

¹²

JSON path (http://code.google.com/p/json-path/) is a query language for selecting nodes from a JSON documents, similar to XPath for XML.

expression is given, that locates the corresponding subnodes. Finally, the table contains the patterns that describe resulting LinkedSpending URLs. For example, the OpenSpending URL os:berlin_de/model contains the node $.mapping.amount which has a type value of “attribute” and is, thus, transformed to the OpenSpending instance lso:amount of the class qb:AttributeProperty.

Equivalent component properties (dimensions, attributes and measures) are identified as follows: A configuration file optionally specifies the mapping of data set and property name to an entity in the LinkedSpending ontology. By default, the property URI is derived from the property name. Properties with the same name in different data sets not having a mapping entry that states otherwise are assumed to represent the same concept and thus given the same URL.13

¹³

Although that has the possibility of mismatches, such a mismatch has not been spotted yet. Still, evaluating and, if necessary, improving the automatic matching is part of future work.

Use of established vocabularies In addition to the standard vocabularies, RDF, RDFS, OWL and XSD, the DCMI vocabulary is used for source and generation time metadata. The data sets are modelled, first and foremost, according to the RDF Data Cube vocabulary, which specifies the structure of a data cube. LinkedSpending follows the RDF Data Cube recommendation to make heavy use of the SDMX model for measures, attributes and dimensions. The data sets are very heterogeneous but there are some properties which are commonly specified and thus modelled with established vocabularies. The year and date, a data set and an observation refers to, respectively, is expressed by sdmx-dimension:refPeriod and XSD.

Currencies are taken from DBpedia [9] and countries are represented using the vocabulary of LinkedGeoData [16], a hub for spatial linked data. Some amount of data is imported from LinkedGeoData countries and DBpedia currencies. Because of the limited number of countries and currencies, and property values imported per country and currency, the amount of data is too small to consider federated querying. As most countries and currencies are stable in the medium term, this data needs to be updated only infrequently.

Interlinking There are two possibilities to align entities to another vocabulary: 1) to use the entities directly and 2) to create an own RDF resource with interlinks, like owl:sameAs, to that vocabulary. We generally preferred the first approach because a higher amount of reuse provides easier integration, better understandability and tool support. While we did not find sameAs link targets on observation level, i.e. exactly the same statistical observations described in other data sets, there are many possibilities for interlinks between data sets or dimension values and concepts they refer to. Using the labels of those data sets and dimension values, it is possible, for example, to link values of the dimension “region” of a federal budget, and thus indirectly also the observations which use those values, to the cities in DBpedia or LinkedGeoData whose labels are contained in the label of the region value URI.

Error handling The OpenSpending API lists 732 data sets with 627 of them having a LinkedSpending equivalent. The discrepancy is caused by loss in several stages. To prevent timeouts and to reduce the impact of disrupted connections, the source data set is downloaded in several parts with a maximum number of entries. These parts are then merged so that each file corresponds to exactly one data set. Data sets without observations are removed and the remaining data sets are transformed, noting the missing values for all component properties. If the first 1000 values are all missing, the transformation is aborted, otherwise a lso:completeness value $c = \frac{| existing values |}{| observations | | component properties |}$ is attached to the data set. Besides empty or non-existing data sets, there were no other types of error observed.

Sustainability The data conversion process is controlled by a web application14

¹⁴

http://linkedspending.aksw.org/api

, which constantly checks for added and modified data sets from OpenSpending, which are automatically queued for conversion but can also be manually managed. Updates do not interrupt the accessibility of the SPARQL endpoint and the services building on it. On average, about 50 new data sets became available on each month between September 2013 and March 2014. A service monitor constantly checks the state of the application and reports errors.

Fig. 5.

View of the data set berlin_de in the OntoWiki.

Performance The transformation of a data set takes less than an hour on average on a 2 GHz virtual machine, using less then 2 GB of RAM.

5. Publishing

The data is published using OntoWiki [2]. The interface for human and machine consumption is available at http://linkedspending.aksw.org. Depending on the actor and needs, OntoWiki provides various abilities to gather the published RDF data.

It can be explored by viewing the properties of a resource, its values and by following links to other resources (see Fig. 5). Using the SPARQL endpoint15

¹⁵
http://linkedspending.aksw.org/sparql

provided by the underlying Virtuoso Triple Store,16

¹⁶

http://virtuoso.openlinksw.com

actors are able to satisfy complex information needs.

Faceted search offers a selection of values for certain properties and thus slice and dice of the data set according to the interests on the fly. For example, depicted in Fig. 6 is all Greek police spending in a certain region. Visualization supports discovery of underlying patterns and gain of new insights about the data, for example about the relative proportions of a budget (see Fig. 7). We set up the RDF DataCube Browser CubeViz [15] as part of the human consumption interface.

Licensing All published data is openly licensed under the PDDL 1.0. in accordance with the open definition.17

¹⁷

http://opendefinition.org/

Table 3

Technical details of the LinkedSpending data set

URL	http://linkedspending.aksw.org
Version date and number	2013-8-14, 0.1
Version date and number	2014-4-11, 2014-3
License	PDDL 1.018
SPARQL endpoint	http://linkedspending.aksw.org/sparql
Compressed N-Triples Dump	http://linkedspending.aksw.org/extensions/page/page/export/lscomplete20143.tar.gz
datahub entry	http://datahub.io/dataset/linkedspending

Fig. 6.

Faceted browsing in CubeViz by restricting values of dimensions.

Fig. 7.

CubeViz visualization of the Romanian budget of 2013.

6. Overview over the data sets

LinkedSpending consists of 627 data sets (continually growing) with more than five million observations total. The amount of observations of the individual data sets varies considerably between two (spendings in Prague of about 5000 CZK for an unknown purpose) and 242 209 (“Spending from ministries under the Danish government”). Table 4

¹⁸
http://opendatacommons.org/licenses/pddl/1.0/

details the average and total amount of data in bytes, triples, and observations as well as the number of links to external data sets, which, for the presented version of 2014-3, amounts to more than 9 million links to LinkedGeoData countries and 1.5 million links to DBpedia currencies.19

¹⁹

The links are inflated as they originate in observations instead of data sets, which allows better querying and tool support.

Figure 8 shows the distribution of the numbers of measures, attributes and dimensions of the data sets.20

²⁰

This analysis relates to version 0.1, which contains less data sets.

Measures represent the quantity that an observation describes. All data sets have at least one measure which is the amount of money spent or received. For most of them (217) that is the only one but there are data sets with up to 7 measures. Attributes give further context to the measurement. The number of attributes is more varied, ranging from 2 to 26, with all data sets having at least a currency and a country, and most of them additionally the time the observations refer to. While the number of dimensions ranges from none21

²¹

There is only one data set with no dimensions which a test data set on OpenSpending, as a data cube with no dimensions is not useful.

to 32, almost all of the data sets have between 1 and 6 dimensions, the most common ones being the year and the time the data set and the observations refers to, respectively. Technical details about the data sets are described in Table 3.

Fig. 8.

Histogram of measures, attributes and dimensions (version 0.1). 217 data sets have exactly one measure (clipped bar).

Example queries Table 5 contains queries for common use cases: Queries 1–6 are basic queries. Query 7 uses the interlinking to DBpedia currencies by querying over two different graphs.22

²²

Parts of DBpedia and LinkedGeoData describing countries and currencies have been integrated in the SPARQL endpoint. With federated querying however, nearly the whole LOD cloud can be queried.

Query 8 uses the custom vocabulary23

²³

In this case, the “Hauptfunktion” and “Oberfunktion” are unique to the berlin_de dataset.

which is available for each data set.

7. Related work

The TWC Data-Gov Corpus [6,7] consists of linked government data from the Data-gov project. However, it only contains transactions made in the US and does not overlap with OpenSpending. The publicspending.gr project generates and publishes [17] public spending data from Greece based on the UK payment ontology and without using statistical data cubes. The UK government expenditure dataset COINS24

²⁴
http://data.gov.uk/dataset/coins

is available as Linked Data.25

²⁵

http://openuplabs.tso.co.uk/sparql/gov-coins, in a beta version.

LOD Around-The-Clock (LATC)26

²⁶

http://latc-project.eu

is a project, which was funded by the European Union (EU) and converted European open government data into RDF. One of its outcomes is the FTS27

²⁷

http://ec.europa.eu/budget/fts

[11] project, which transforms and publishes financial transparency data on EU spending. In comparison with LinkedSpending, those projects also contribute linked government data but with a different or more limited scope.

Furthermore, the Digital Agenda Scoreboard [10] is an EU project that keeps track of the transformation of statistical data to RDF.

Table 4

Amount of data for version 2014-3. All values are rounded to the nearest integer

	Total	Average
number of data sets	627
file size (RDF/N-Triples)	24 585 MB	39 MB
triples	113 640 534	181 245
observations	5 026 393	8017
links to external data sets	10 696 614	17 060

8. Conclusions, shortcomings, future work

As shown in Section 4, we converted several hundreds of financial data sets to RDF and, as shown in Section 5, we published them as Linked Open Data in several ways. However, we recognise a few shortcomings and our goal is to enrich the meta data with the help of domain experts and to refine the structure of the individual data sets. Furthermore, we plan to improve the automatic configuration of CubeViz.

Multilinguality RDF itself provides support for multilingualism, which is one of its key advantages to other representation formats. The source data does not contain language tags, however, and the languages used do not always match the country, the data refers to. Automatic language detection on single labels did not yield a satisfying precision and it is not possible to increase the precision of the language detection by combining the estimates about several labels of an observation as their language is not always identical. We plan statistical examinations of the relations between labels of different entities and more complex schemes based on those examinations, which can achieve language detection with a higher precision. Additionally, we plan to automatically translate all literals to several languages.

Individual modelling Because the source data is already structured, the transformation of all the data sets without the need of text extraction and in an automatic way was feasible. On a deep level however, there is much unmodelled structure that is unique to each data set or at most shared between several of them, for instance the categorization of spending into several specific “plans” in German budgets. Because of the amount of data sets, modelling all details, and thus also improving the internal and external connectivity, requires either a large-scale cooperation or a crowd-driven approach, which we did not perform yet.

Drilldowns Because of the hierarchical organization of the different coded properties “groups” and “functions”, the visualizations on openspending.org permit “zooming” (drilldown) in and out of the different levels of the data. The RDF Data Cube vocabulary specifies the use of skos:ConceptScheme or qb:HierarchicalCodeList but neither variant is fully implemented and it is not clear, which of those modelling possibilities will become standard.

Table 5

Examplary SPARQL queries for typical use cases

Information need		SPARQL Query
1	list of all data sets	select ∗ {?d a qb:DataSet}
2	all measures of the dataset berlin_de	select ?m { ls:berlin_de qb:structure ?s. ?s qb:component ?c. ?c qb:measure ?m.}
3	all years which have observations in the de-bund dataset from 2020 onwards	select distinct ?year {?o a qb:Observation. ?o qb:dataSet ls:de−bund. ?o lso:refYear ?year.FILTER (xsd:date(?year) >= "2020−1−1"^^xsd:date) }
4	spendings of more than 100 billion €	select ∗ {?o lso:amount ?a. ?o dbo:currency dbpedia:Euro. FILTER(xsd:integer(?a)>"1E11"^^xsd:integer) }
5	data sets with multiple years	select ?d count(?y) as ?count { ?d a qb:DataSet. ?d lso:refYear ?y. } group by ?d having (count(?y)>1)
6	sums of amounts for each reference year of berlin_de	select ?y (sum(xsd:integer(?amount)) as ?sum) {?o qb:dataSet ls:berlin_de. ?o lso:refYear ?y. ?o lso:amount ?amount.} group by ?y
7	data sets with currencies whose inflation rate is greater than 10%	select distinct ?d ?c ?r {?o qb:dataSet ?d. ?o dbo:currency ?c. ?c dbp:inflationRate ?r. filter(?r > 10)}
8	Berlin city subsectors of research and education that have had their budget reduced from 2012 to 2013 (data set version 0.1)	select ?l (sum(xsd:integer(?amount12)) as ?sum12) (sum(xsd:integer(?amount13)) as ?sum13) {?o qb:dataSet ls:berlin_de. ?o lso:Hauptfunktion <http://openspending.org/berlin_de/Hauptfunktion/1>. ?o lso:Oberfunktion ?of. ?of rdfs:label ?l. {?o lso:refYear "2012"^^xsd:gYear. ?o lso:amount ?amount12.} UNION {?o lso:refYear "2013"^^xsd:gYear. ?o lso:amount ?amount13.}} group by ?l having (sum(xsd:integer(?amount12)) > sum(xsd:integer(?amount13)))
9	data sets ordered by their number of properties in common with 2012_tax (having at least one such common property)	select ?d (count(?c) as ?count) {ls:2012_tax qb:structure ?s. ?s qb:component ?c. ?d qb:structure ?s2. ?s2 qb:component ?c. FILTER(?d!=ls:2012_tax)} group by ?d order by desc(?count)

Interlinking Extensive interlinking of referenced entities to the all-purpose knowledge base of DBpedia provides additional context. Coded property values, such as the budget areas healthcare and public transportation, can be interlinked with their respective DBpedia concepts. This enables the usage of type hierarchies and thus new ways of structuring the data and provides more meaningful aggregations and new insights.

Question answering We plan to develop a question answering system that allows accessing statistical Linked Data in the form of RDF Data Cubes using natural language questions [8]. LinkedSpending is used both as the first knowledge base and for performance evaluation.

Footnotes

Acknowledgements

Special thanks goes to the people behind the OpenSpending project, including Friedrich Lindenberg for suggesting the conversion.

References

[1]

J.E.

Alt,

D.D.

Lassen and

Skilling, Fiscal transparency, gubernatorial popularity, and the scale of government: evidence from the states, Technical report, Economic Policy Research Unit (EPRU), University of Copenhagen, 2001.

[2]

Auer,

Dietzold,

Lehmann and

Riechert, OntoWiki: a tool for social, semantic collaboration, in: Proc. of the Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007) at the 16th International World Wide Web Conference (WWW2007), Banff, Canada, May 8, 2007,

N.F.

Noy,

Alani,

Stumme,

Mika,

Sure and

Vrandecic, eds, CEUR Workshop Proceedings, Vol. 273, CEUR-WS.org, 2007.

[3]

Auer,

Bühmann,

Dirschl,

Erling,

Hausenblas,

Isele,

Lehmann,

Martin,

P.N.

Mendes,

van Nuffelen,

Stadler,

Tramp and

Williams, Managing the life-cycle of Linked Data with the LOD2 stack, in: The Semantic Web – ISWC 2012,

Cudré-Mauroux,

Heflin,

Sirin,

Tudorache,

Euzenat,

Hauswirth,

Parreira,

Hendler,

Schreiber,

Bernstein and

Blomqvist, eds, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 1–16.

[4]

Berners-Lee, Putting government data online – design issues, 2009, W3C design issue.

[5]

Cyganiak and

Reynolds, The RDF Data Cube vocabulary, W3C recommendation, 2014.

[6]

Ding,

DiFranzo,

Graves,

Michaelis,

Li,

D.L.

McGuinness and

Hendler, Data-gov wiki: towards linking government data, in: Proc. of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Vol. 10, AAAI Press, Atlanta, Georgia, 2010, p. 1.

[7]

Ding,

DiFranzo,

Graves,

J.R.

Michaelis,

Li,

D.L.

McGuinness and

J.A.

Hendler, TWC data-gov corpus: incrementally generating linked government data from data.gov, in: WWW ’10: Proc. of the 19th International Conference on World Wide Web, ACM, New York, NY, USA, 2010.

[8]

Höffner and

Lehmann, Towards question answering on statistical Linked Data, in: Proc. of the 10th International Conference on Semantic Systems,

Sack,

Filipowska,

Lehmann and

Hellmann, eds, ACM, New York, NY, USA, 2014, pp. 61–64.

[9]

Lehmann,

Isele,

Jakob,

Jentzsch,

Kontokostas,

P.N.

Mendes,

Hellmann,

Morsey,

van Kleef,

Auer and

Bizer, DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web Journal (2014).

10.

[10]

Martin,

van Nuffelen,

Abruzzini and

Auer, The digital agenda scoreboard: a statistical anatomy of Europe’s way into the information age, Technical report, University of Leipzig, 2012.

11.

[11]

Martin,

Stadler,

Frischmuth and

Lehmann, Increasing the financial transparency of european commission project funding, Semantic Web Journal, Special Call for Linked Dataset Descriptions 2 (2013), 157–164.

12.

[12]

A.-C.

Ngonga Ngomo, A time-efficient hybrid approach to link discovery, in: Ontology Matching, OM-2011, Proc. of the ISWC Workshop,

Shvaiko,

Euzenat,

Heath,

Quix,

Mao and

Cruz, eds, 2011, pp. 1–12.

13.

[13]

S.J.

Piotrowski and

G.G.

Van Ryzin, Citizen attitudes toward transparency in local government, The American Review of Public Administration 37(3) (2007), 306–323.

14.

[14]Regulation (EU, Euratom) no 966/2012, 2012, Article 35: publication of information on recipients and other information.

15.

[15]

P.E.

Salas,

Maia Da Mota,

Breitman,

M.A.

Casanova,

Martin and

Auer, Publishing statistical data on the web, International Journal of Semantic Computing 6(4) (2012), 373–388.

16.

[16]

Stadler,

Lehmann,

Höffner and

Auer, Linkedgeodata: a core for a web of spatial open data, Semantic Web Journal 3(4) (2012), 333–354.

17.

[17]

Vafopoulos,

Meimaris,

Anagnostopoulos,

Papantoniou,

Xidias,

Alexiou,

Vafeiadis,

Klonaras and

Loumos, Public spending as LOD: the case of Greece, Semantic Web Journal (2013).

LinkedSpending: OpenSpending becomes Linked Open Data

Abstract

Keywords

1. Introduction

1 “2. The Commission shall make available, in an appropriate and timely manner, information on recipients, as well as the nature and purpose of the measure financed from the budget […]” [14].

3. OpenSpending source data

4 http://openspending.org/

3.2. Problems

8 http://sdmx.org

15 http://linkedspending.aksw.org/sparql

18 http://opendatacommons.org/licenses/pddl/1.0/

24 http://data.gov.uk/dataset/coins

Footnotes

Acknowledgements

References

¹
“2. The Commission shall make available, in an appropriate and timely manner, information on recipients, as well as the nature and purpose of the measure financed from the budget […]” [14].

⁴
http://openspending.org/

⁸
http://sdmx.org

¹⁵
http://linkedspending.aksw.org/sparql

¹⁸
http://opendatacommons.org/licenses/pddl/1.0/

²⁴
http://data.gov.uk/dataset/coins