Data collection screen
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How do we collect data from Twitter?
How can we specify criteria for a data collection?
Objectives
Understanding the data collection process.
Before We Start
- Setup COSMOS.
- Authorise your Twitter account to collect data.
- Check if there is an update.
Lessons
1) Collection Process
The Twitter API allows COSMOS to stream public Tweets from the platform in real-time. COSMOS uses filtered stream endpoints to narrow criteria you defined from the COSMOS interface such as keywords
, hashtags
, account names
, language
, place
(see the Data Collection Screen section). To get more information about filtering realtime tweets, follow the link below:
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters
Creating a filter is helpful because it prevents you from collecting unwanted tweets for your research.
After you start a collection through COSMOS, you will begin receiving a stream of data. All streamig data is saved in JSON format in the /home/COSMOS-files/tmp
folder temporarily. For each collection, COSMOS creates a folder which has the same name with the collection in the tmp
folder. Folder structure:
<%collection_year%>/<%collection_month%>/<%collection_day%>/twitter-collection-<%hour%>.json
Since COSMOS does not show all entities from the raw streamed data in it’s interface, the temp files could be beneficial if these need to be accessed. Later, COSMOS creates a local database on your machine and moves the dataset to the database from the ‘tmp’ folder.
2) Data Collection Screen
After setting up COSMOS on your machine, you can start the software as instructed.
Once COSMOS is launched in your browser, click the plus button on the top left corner. You will see the options Import Data
, Import RSS Feed
and Start Twitter Collection
. You can generate a Twitter collection by filtering or sampling:
Filter: COSMOS provides a filtering feature to narrow a collection while streaming data from Twitter.
- Click
Start Twitter Collection
. - Choose the collecting type as
Filter
. -
Give an appropriate name for the collection on the
Twitter Collector
pop-up window. - Start the collection based on filtering criteria:
Keywords and hashtags
,Language
,Location
,Twitter accounts
andMaximum number of tweets
by filling the form. - While filling the
Keywords and hashtags
,Twitter accounts
andLanguage
sections, you should pressEnter
after typing each entry. Once you pressEnter
, it should turn blue.
- When the form is filled, click the submit button. Once the collection starts, it appears on the show panel.
Tip:
- When you enter multiple keywords, hashtags or accounts when filtering the collection, the Twitter API determines which results to return using logical OR. For example; If you start a collection with keywords
covid
,vaccine
andcoronavirus
and Twitter account@BorisJohnson
, you would get;
- tweets posted by Boris Johnson (they do not have to be related to covid),
- tweets contains
covid
,vaccine
orcoronavirus
keywords.- Unlike keywords, hashtags and accounts, the
language
search term uses logical AND. If we add an English language filter for the example collection above, you will get only English tweets with the keywords or accounts.You can learn more about this at https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters
Sample: COSMOS can stream tweets without any filter or specification.
- Click
Start Twitter Collection
. - Choose the collecting type as
Sample
. - Specify the
Collection name
,Collection description
andMaximum number of tweets
click theSubmit
button. - Once you click, the sample dataset will appears on the showpanel.
3) More on Collections
The collection takes time as it streams tweets in real-time.
While the collection continues, clicking the three dots on the show panel:
- Stop the collection when enough data has been collected.
- Snapshot the collection to create a subset consisting of data which has been collected until the snapshot.
When the collection has been stopped, clicking the three dots on the show panel:
- Query: You can filter your collected data based on
tweets sentiment
,date
,gender
,language
andcountry
. When you query the dataset, it creates a subset of the data based on the query details. This feature helps to remove noisy data and shorten the data analysis process.- Export Data Details
- Export collected data: Exports collected data as a CSV file.
- Delete
- Details of data set: Exports the dataset’s details as a json file e.g. date started, name of the collection, …
You can also watch a YouTube video showing all COSMOS data collection processes by clicking the image below.
Tip:
- A maximum of two data collections can be started simultaneously because of Twitter API rate limits. You can learn more about rate limits
Exercise
Collect
10000
english tweets using,
keywords, for,
Twitter accounts using COSMOS 2.0
Exercise
While collection continues;
- Create a subset and give it an appropriate name. Which COSMOS feature did you use for this purpose?
- Create and name a subset filtering only sentimentally
negative
tweets in thelast two days
. Which COSMOS feature did you use for this purpose?Solution
- Click 3 dots on the panel and select
Snapshot
. After creating a subset click 3 dots and chooseDetails
option from the dropdown menu, edit the name of the subset and hit theUpdate
button.- Click 3 dots on the panel and select
Query
. Then, fill the form on the pop-up window choosing sentiment score as a negative number(between -5 to 0) and entering a date of last two days and click theQuery
button. After creating a subset, click 3 dots again and choose theDetails
option from the dropdown menu, edit the name of the subset and hit theUpdate
button.
Key Points
COSMOS can collect twitter data from Twitter.
COSMOS can filter collections based on a variety of criteria.
COSMOS can create subsets of the data while the collection continues.