The file contains 3.1 million rows, each representing one article observed at one point in time.
The file uses these columns:
timestamp: The time (in UTC) of the fetch. All articles from the same fetch will have the same timestamp. position: The article’s zero-indexed position in the trending strip, from left to right. text: The text of the link used to highlight the article. Note: Sometimes the same article is associated with different text at different points in time. url: The link’s URL. Note: Sometimes (although relatively rarely) the URL for the same underlying article changes over time.
Note: Although the script generally ran every five minutes, there are some gaps in the data, accounting for roughly 3% of the total time period covered. These gaps owe to two main factors: technical complications (such as server downtime) and periods during which the website swapped out the trending strip with breaking news alerts, single-story highlights, or other notices. Unfortunately, I did not have the foresight to collect data that would distinguish between those scenarios.
submitted by /u/brianckeegan
[link] [comments]