Sample Data
Page Contents
- Apache Software Foundation Issue Tracker
- NASA Apache Web Logs
- Wikipedia Web Logs
- World Cup 2014 Player Data
Created by gh-md-toc
Apache Software Foundation Issue Tracker
The sample time-series dataset in apachejira comes from public Apache instance of JIRA. It was converted to a TSV file with a Java program that uses the JIRA API, with the following fields:
action | The action taken, one of “comment”, “create”, or “update”. |
actor | The name of the user who took the action. |
assignee | The name of the user to which this issue is assigned. |
category | The category to which the issue belongs. Multiple projects roll up to a single category. |
fieldschanged | Which fields were modified in this action. A single field containing a space-separated list of fields modified. For example, if the action changed the assignee and status in this action, there will be one result: “assignee status”. More appropriate for display. |
fieldschangedtok | Which fields were modified in this action. A multi-valued field containing one result for each field changed. For example, if the action changed the assignee and status in this action, there are two different results: “assignee” and “status”. More appropriate for filtering. |
fixversion | The fixversions this issue was set to when this action was taken. A single field containing a pipe (|) separated list of fixversions. For example, if this issue is assigned to fixversions master and 7.0, there is one result: “master|7.0”. More appropriate for display. |
fixversiontok | The fixversions this issue was set to when this action was taken. A multi-valued field containing one fix version each. For example, if this issue is assigned to fixversions master and 7.0, there are two results: “master” and “7.0”. More appropriate for filtering. |
issueage | The number of seconds between when this action took place and when the issue was created. |
issuekey | The key of the specified JIRA ticket. |
issuetype | The type of this ticket. |
prevstatus | The status of the ticket prior to this action. If the action did not change status, this will always be the same as the current status. |
project | The name of the project (not the abbreviation) to which this issue belongs. |
reporter | The name of the person that created this issue. |
resolution | If this issue is resolved, the string value of the Resolution field. Otherwise blank. |
status | The status of the issue. |
summary | he summary (i.e., short description or name) of the ticket. |
timeinstate | The number of seconds between when this action took place and when the last action changed this issue’s status. |
timesinceaction | The number of seconds between this action and the last action. |
unixtime | The unix timestamp for this action. |
Adapted from: Apache Software Foundation JIRA
NASA Apache Web Logs
The sample time-series dataset in nasa_19950801.tsv comes from public 1995 NASA Apache web logs. The file contains data for a single day and is in an Imhotep-friendly TSV format.
A Perl script was used to convert the Apache web log into the TSV format, extracting the following fields:
host | When possible, the hostname making the request. Uses the IP address if the hostname was unavailable. |
logname | Unused, always - |
time | In seconds, since 1970 |
method | HTTP method: GET, HEAD, or POST |
url | Requested path |
response | HTTP response code |
bytes | Number of bytes in the reply |
Here is an example line (or document) from the dataset:
piweba3y.prodigy.com - 807301196 GET /shuttle/missions/missions.html 200 8677
The timestamp 807301196
is the conversion of 01/Aug/1995:13:19:56 -0500
using Perl:
use Date::Parse; $in = "01/Aug/1995:13:19:56 -0500"; $out = str2time($in); print "$out\n";
Data for two months are available in these compressed files:
nasa_19950630.22-19950728.12.tsv.gz
nasa_19950731.22-19950831.22.tsv.gz
TSV Data Size (raw uncompressed) | Imhotep Data Size |
---|---|
256 MB | 19 MB |
Source: Internet Traffic Archive
Wikipedia Web Logs
The time-series data in wikipedia_e_20140913.11.tsv.gz is one hour of data from 9/13/2014 for Wikipedia articles beginning with the letter E.
Each document corresponds to a Wikipedia article that was served in that hour:
title | Title of the article on Wikipedia |
categories+ | List of categories in which the article is contained |
titleWords+ | List of words in the title |
linksOut+ | List of Wikipedia articles linked by the article |
numRequests | Number of requests for the article in that hour |
bytesServed | Number of bytes served for the article in that hour |
The most popular E entry in that hour was English_alphabet
.
title | categories+ | titleWords+ | linksOut+ | numRequests | bytesServed |
---|---|---|---|---|---|
English_alphabet |
All_Wikipedia_articles_needing_clarification All_articles_needing_additional_references All_articles_with_unsourced_statements Articles_containing_Old_English-language_text Articles_needing_additional_references_from_June_2011 Articles_with_hAudio_microformats Articles_with_unsourced_statements_from_January_2011 Articles_with_unsourced_statements_from_July_2010 Articles_with_unsourced_statements_from_March_2014 English_spelling Latin_alphabets Wikipedia_articles_needing_clarification_from_August_2013 |
English alphabet |
A Adjective Aircraft Alphabet_song American_English American_braille American_manual_alphabet Ampersand Anglo-Saxon_futhorc Anglo-Saxons Ansuz_(rune) Apostrophe B Body_cavity British_English Byrhtfert ... |
960 |
21124206 |
TSV Data Size (raw uncompressed) | Imhotep Data Size |
---|---|
2450 GB | 272 GB |
Source: https://dumps.wikimedia.org/other/pagecounts-raw/ for page counts and https://dumps.wikimedia.org/backup-index.html for all other fields
World Cup 2014 Player Data
The dataset in worldcupplayerinfo_20140701.tsv includes information about players in the World Cup 2014. Since this is not typical time-series Imhotep data, all documents are assigned the same timestamp: 2014-07-01 00:00:00
Each document in the dataset includes information about a single player:
Player | String | Player’s name. |
Age | Int | Player’s age. |
Captain | Int | Value (1 or 0) indicates whether the player is a captain. |
Club | String | The player’s club when not playing for the national team in the World Cup. |
Country | String | The country the player represents in the World Cup. |
Group | String | The player’s national team belongs to this World Cup group. |
Jersey | Int | The player’s jersey number. |
Position | String | The player’s position. |
Rank | Int | The ranking of the country the player represents. |
Selections | Int | The number of World Cup appearances for this player. |
TSV Data Size (raw uncompressed) | Imhotep Data Size |
---|---|
45 KB | 15 KB |
Source: Stack Exchange Network / Open Data
The data are distributed under the creative commons Attribution-Share Alike 4.0 International license. The creator of the data is http://opendata.stackexchange.com/users/3061/bryan. In compliance with this license, the data is hereby attributed to the users and owners of StackOverflow, but not in such a way as to suggest that they endorse Indeed or Indeed’s use of the data.