COSMOS 2.0 Workshop

Introduction to COSMOS

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • What is COSMOS useful for?

Objectives
  • Understanding how we can benefit from COSMOS for our research.

  • Describe COSMOS’s uses.

  • Locate helpful resources to learn more about COSMOS.

Motivation

Features

Before We Start

Getting help for COSMOS

You can find out a lot more about COSMOS at http://socialdatalab.net/COSMOS and check out introduction videos at http://socialdatalab.net/instruction-videos. You can fill the form and request an install link from the website. When you request COSMOS, you will receive an email from cosmosprojectuk@gmail.com which will including the download link and links to forms to report bugs and request new features. If you ever need help using COSMOS, you can contact us via this email address. Also, if you want to stay updated about COSMOS development, you can checkout the Blogposts section from the menu.

Key Points

  • COSMOS is compact, powerful and free software to collect and analyse social media data.

  • It aims to democratise access to social media data for social scientists and others.


Before using COSMOS

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How do we authorise the collection of data from Twitter?

  • How to get the latest version of COSMOS?

Objectives
  • Authorising your Twitter account to use this collection service.

  • Updating COSMOS to the latest version before using it.

Lessons

1) Authorising your Twitter Account

After installing and setting up the COSMOS Desktop, here we will authorise your Twitter account before starting a collection:

  1. Click the plus button on the top left corner (in the show panel).
  2. Choose the ‘Start Twitter Collection’ option from the dropdown list.
  3. Click the ‘Sign in with Twitter’ button on the pop-up window.

    sign with twitter

  4. Fill-in the form with your Twitter username and password and click the ‘Authorize app’ button. If you do not have a Twitter account, you can sign up for Twitter by clicking the ‘Sign up for Twitter’ link on the top right corner of the page.

    sign with twitter form

  5. After seeing the ‘login was successful’ message, you will see the Twitter Collector pop-up window that means COSMOS is ready to start a collection.

    Twitter Collector

2) Check for Updates

Once you install COSMOS on your computer, you do not need to reinstall COSMOS to receive new updates. Since COSMOS is in active development, we have developed a feature to enable you to get the latest version of COSMOS. COSMOS automarically checks for updates each time it’s started. You can also click on the menu button on the top right corner of the page and click on the ‘Check for updates’ option. Check for updates

If there is a new version, click on the ‘yes’ option so you can upgrade to the latest version of the COSMOS. To use the latest version, you need to restart the COSMOS application.

Check for updates2


You can also watch a YouTube video showing all COSMOS authorising processes by clicking the image below.


Exercises

1) Perform the steps to authorise your Twitter account for tweet collection.

2) Check for updates before using COSMOS. Did COSMOS automatically restart?

Key Points

  • To start a data collection, you must authorise your Twitter account.

  • To use the latest version of COSMOS, check for updates each time you run it.


Data collection screen

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How do we collect data from Twitter?

  • How can we specify criteria for a data collection?

Objectives
  • Understanding the data collection process.

Before We Start

Lessons


1) Collection Process

The Twitter API allows COSMOS to stream public Tweets from the platform in real-time. COSMOS uses filtered stream endpoints to narrow criteria you defined from the COSMOS interface such as keywords, hashtags, account names, language, place(see the Data Collection Screen section). To get more information about filtering realtime tweets, follow the link below: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters

Creating a filter is helpful because it prevents you from collecting unwanted tweets for your research.

After you start a collection through COSMOS, you will begin receiving a stream of data. All streamig data is saved in JSON format in the /home/COSMOS-files/tmp folder temporarily. For each collection, COSMOS creates a folder which has the same name with the collection in the tmp folder. Folder structure:

<%collection_year%>/<%collection_month%>/<%collection_day%>/twitter-collection-<%hour%>.json Folder Structure

Since COSMOS does not show all entities from the raw streamed data in it’s interface, the temp files could be beneficial if these need to be accessed. Later, COSMOS creates a local database on your machine and moves the dataset to the database from the ‘tmp’ folder.

2) Data Collection Screen

After setting up COSMOS on your machine, you can start the software as instructed.

Once COSMOS is launched in your browser, click the plus button on the top left corner. You will see the options Import Data, Import RSS Feed and Start Twitter Collection. You can generate a Twitter collection by filtering or sampling:

Filter: COSMOS provides a filtering feature to narrow a collection while streaming data from Twitter.

  1. Click Start Twitter Collection.
  2. Choose the collecting type as Filter.
  3. Give an appropriate name for the collection on the Twitter Collector pop-up window. Twitter collector

  4. Start the collection based on filtering criteria: Keywords and hashtags, Language, Location, Twitter accounts and Maximum number of tweets by filling the form.
  5. While filling the Keywords and hashtags, Twitter accounts and Language sections, you should press Enter after typing each entry. Once you press Enter, it should turn blue.

Twitter collector

  1. When the form is filled, click the submit button. Once the collection starts, it appears on the show panel. Collection start

Tip:

  1. When you enter multiple keywords, hashtags or accounts when filtering the collection, the Twitter API determines which results to return using logical OR. For example; If you start a collection with keywords covid, vaccine and coronavirus and Twitter account @BorisJohnson, you would get;
    • tweets posted by Boris Johnson (they do not have to be related to covid),
    • tweets contains covid, vaccine or coronavirus keywords.
  2. Unlike keywords, hashtags and accounts, the language search term uses logical AND. If we add an English language filter for the example collection above, you will get only English tweets with the keywords or accounts.

You can learn more about this at https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters

Sample: COSMOS can stream tweets without any filter or specification.

  1. Click Start Twitter Collection.
  2. Choose the collecting type as Sample.
  3. Specify the Collection name, Collection description and Maximum number of tweets click the Submit button. Sample data
  4. Once you click, the sample dataset will appears on the showpanel.

3) More on Collections

The collection takes time as it streams tweets in real-time.
While the collection continues, clicking the three dots on the show panel:

  • Stop the collection when enough data has been collected.
  • Snapshot the collection to create a subset consisting of data which has been collected until the snapshot. Snapshot

When the collection has been stopped, clicking the three dots on the show panel: Collection menu

  • Query: You can filter your collected data based on tweets sentiment, date, gender, language and country. Query When you query the dataset, it creates a subset of the data based on the query details. This feature helps to remove noisy data and shorten the data analysis process.
  • Export Data Details
  • Export collected data: Exports collected data as a CSV file.
  • Delete
  • Details of data set: Exports the dataset’s details as a json file e.g. date started, name of the collection, …

You can also watch a YouTube video showing all COSMOS data collection processes by clicking the image below.


Tip:

  • A maximum of two data collections can be started simultaneously because of Twitter API rate limits. You can learn more about rate limits

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters

Exercise

Collect 10000 english tweets using , keywords, for , Twitter accounts using COSMOS 2.0

Exercise

While collection continues;

  1. Create a subset and give it an appropriate name. Which COSMOS feature did you use for this purpose?
  2. Create and name a subset filtering only sentimentally negative tweets in the last two days. Which COSMOS feature did you use for this purpose?

Solution

  1. Click 3 dots on the panel and select Snapshot. After creating a subset click 3 dots and choose Details option from the dropdown menu, edit the name of the subset and hit the Update button. Snapshot
  2. Click 3 dots on the panel and select Query. Then, fill the form on the pop-up window choosing sentiment score as a negative number(between -5 to 0) and entering a date of last two days and click the Query button. After creating a subset, click 3 dots again and choose the Details option from the dropdown menu, edit the name of the subset and hit the Update button.

Key Points

  • COSMOS can collect twitter data from Twitter.

  • COSMOS can filter collections based on a variety of criteria.

  • COSMOS can create subsets of the data while the collection continues.


Importing pre-existing data

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • How can we use pre-existing data on COSMOS?

Objectives
  • Able to use pre-existing Twitter data on COSMOS.

Lesson


After setting up COSMOS on your computer, you can start the software as instructed.

Once COSMOS is launched in your browser, click the plus button on the top left corner. Once you click the plus, you will see a dropdown menu which contains: Import Data, Import RSS Feed and Start Twitter Collection subitems. To import pre-existing data on COSMOS:

  1. Click Import Data.
  2. Choose between the options drag and drop or browse to import a single file into COSMOS.
  3. Name the dataset and write a description by filling the form on the popup window. Import pre-exist Data
  4. After filling the form and selecting the file, just hit the import button.
  5. The imported data set appears on the Show Panel and it is ready to use. Imported Data

Tip:

  • Only CSV and JSON files are accepted for Twitter data.
  • To be able to visualize and analyse Twitter dataset in COSMOS, dataset has to have these entitites in this specific order: ‘Language_code’, ‘Timestamp’, ‘Longitude’, ‘Tweeet_text’, ‘Tweet_id’, ‘Screen_name’, ‘Latitude’.

You can also watch a YouTube video showing all COSMOS importing pre-existing data processes by clicking the image below.


Exercise

  1. Import and name any pre-existing data on COSMOS following the same steps.

Key Points

  • COSMOS can be used for pre-existing data.

  • COSMOS can only use CSV and JSON data formats.


Visualizations

Overview

Teaching: 30 min
Exercises: 20 min
Questions
  • How do we visualize data on COSMOS?

  • What kind of data analysis COSMOS provides?

Objectives
  • Understanding visualization process on COSMOS.

  • Understanding types of data analysis options with COSMOS.

Lessons


1. Visualize the Data Collection

Once enough data has been collected or imported using COSMOS, it is time to visualize data to obtain the details and information we are looking for. To start visualization:

  1. Drag and drop the data collection to the workspace
  2. Once you drag and drop the data collection to the workspace, table view opens automatically. This view provides a quick overview of dataset details for both Twitter entities and the ones identified by COSMOS such as ‘gender’.

Table view

2. Multiple Visualization Views

There are 10 visualization options in COSMOS. When you click the little arrow on the Table button (top of the right-hand side of the page), it shows all options (List, Table, Text, Word Cloud, Frequency, Network, Map, Pie Chart, Line Chart)

If you want to visualize the same data collection using more than one option:

If you prefer to visualize different datasets, you need to create a new workspace by clicking the plus button near the workspace tab on the top of the page. Once you created a new workspace, the visualization steps will be the same.

a) List

For the list view, you can create a list selecting type of meta data. To do that, click the 3 dots in the top of the right-hand corner and choose the Settings. Then, choose one of the option you need from the drop-down menu. See the illustration below:

List view

b) Text

Text view just demonstrates the text version of the metadata. Text view

c) Word Cloud

For the word cloud view you can;

d) Frequency

There is a separate lesson for frequency view. (See lesson 6)

e) Network

There is a separate lesson for frequency view. (See lesson 6)

f) Map

Tweet geospatial metadata has been used to create the map view. COSMOS visualizes geographic data points on the map based on tweet location (if it is specified by the user). The map view can be used to identify hotspots of tweets within the collection. You benefit from this view to monitor and map the spread of social movements such as specific events, occasions, disease outbreaks, etc.

For the map view you can;

g) Pie Chart

For the pie chart view you can;

i) Line Chart

For the line chart view you can;


You can also watch youtube video clicking the image below for all data visualization process with COSMOS.


Exercises

  1. Create a pie chart representing negative sentiments. Then, create and name a subset data set for the group of tweets that has a sentiment score is ‘-3’.
  2. Visualize your data collection using the Word Cloud view showing up to 50 words. Then, create and name a subset including one of the most used words in Twitter text.
  3. Create a multi-line chart representing positive sentiment and negative sentiment.

3. Drilling down into data

COSMOS provides many ways to create and visualize subsets regarding research needs.

1. Query

After enough data has been collected, clicking the three dots on the show panel you can filter data collection based on tweets sentiment, date, gender, language and country. Parse Options

2. Create a subset from the collection

While the collection continues, clicking the three dots on the show panel, Snapshot the collection to create a subset data which has been collected until the snapshot.

Parse Options

3. Create a subset from the visualizations

After creating and naming a subset from any visualization view, you can apply the same visualization steps to the subset data you recently created. It allows examining and comparing the data in details. See the video tutorial for this lesson below:

More on Visualizations

Once you create more than one visualization, selecting data in one view highlights the same data for other views.

For example;

let’s say we created one table view and on pie chart view. When we click on one of the pie chart slices (female one), it highlights female tweets on the Table view. Selected data

Tip:

If it takes too long to create one of the visualization views:

  • Click on the menu icon on the right top corner of the page and select Reset Interface.
  • Then, try to create the graph again. Reset interface

You can also watch youtube video clicking the image below for all data analysing process with COSMOS.


Exercises

  1. Collect 100,000 tweets with the keywords covid, covid19, coronavirus.
  2. Create a subset for tweets posted from Europe using one of the visualisation methods and call it Europe.
  3. Create another subset from Europe for tweets posted by female users and call it female.

Key Points

  • COSMOS provides ways to demonstrate and analyse Twitter data.


Frequency and network analysis

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How do we use network and frequency views in the COSMOS?

Objectives
  • Understanding frequency and network analysis with COSMOS.

Lessons


1. Frequency Analysis

In the frequency view, COSMOS provides 3 types of frequency analysis on a daily, hourly and minutely basis. You can change the time interval by clicking the slider underneath each graph. This view helps to spot tweet spikes over time easily. Chart sliders can be used to create a subset of data around these spike dates.

See the example of frequency view below:

Frequency view


You can also watch a youtube video by clicking the image below for all frequency processes with COSMOS.


2. Network Analysis

COSMOS provides network analysis with Twitter data. It can currently be used to visualize retweets and mentions networks. Network view 1

These network views aim to identify prominent users and their level of influence in the network. They also show information flows in social media e.g. thoughts of prominent users on a particular event, topic or product. Both views (retweet and mention) help to visualize the connection and interaction between users. Network graphs can be created based on the network metrics degree, closeness and betweenness. Network metrics can be selected from the drop-down menu on the right-hand side of the page. COSMOS also gives the filter option to adjust nodes and label visibility on the network chart. Using the sliding chart on the right panel, you can adjust how the network chart looks.
Network view 2

Similar to other views in COSMOS, you can create subsets ‘Ctrl (Command) + Click’ on multiple nodes to focus on and deeply analyse the network for specific accounts.


You can also watch a youtube video by clicking the image below for all frequency processes with COSMOS.


Exercise

Create a subset of a collection that contains at least 2 users that affect the network for a specific event.

Key Points

  • COSMOS provides frequency analysis with Twitter data to help us analyse and draw conclusions.

  • COSMOS provides network analysis that gives the opportunity to understand social network structure, the role of key players, communities, etc…


Adding RSS Feeds

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • What is an RSS Feed?

  • How to add RSS Feeds to COSMOS?

Objectives
  • Adding RSS Feeds to COSMOS.

Lesson


What is an RSS Feed?

https://en.wikipedia.org/wiki/RSS#:~:text=RSS%20(RDF%20Site%20Summary%20or,in%20a%20single%20news%20aggregator.

https://www.digitaltrends.com/computing/what-is-an-rss-feed/

Once COSMOS is launched in your browser, click the plus button on the top left corner. There are options Import Data, Import RSS Feed and Start Twitter Collection. We will importing an RSS Feed on COSMOS:

  1. Click Import RSS Feed and the popup window appears on the page. Imported Data
  2. Then, find the RSS Feed URL of the website and copy and paste it to the form.
  3. Name the dataset and write a description by filling-in the form on the popup window.
  4. After filling-in the form, just hit the import button.
  5. It appears on the Show Panel and is ready to drag and drop onto the workspace. Imported Data

Tip: How to find the RSS Feed URL of a site?

There are several ways to find the RSS Feed URL of a website. You can find out the ways and instructions clicking this link.


You can also watch a YouTube video showing all COSMOS adding RSS feed processes by clicking the image below.


Challenge 1

Find the RSS Feed URLs for the following websites and start a collection using COSMOS:

  • BBC Sport
  • New York Times Home Page

Solution to challenge 1

  • https://feeds.bbci.co.uk/sport/rss.xml
  • https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml

Key Points

  • COSMOS allows us to collect data from a variety of websites using an RSS Feed.

  • COSMOS can collect RSS Feeds from multiple websites.