As AI and ML studies increase in popularity, university students must gain insights into data collection practices to thrive as terms resume, says web intelligence acquisition solution provider Oxylabs.

The alternative data industry continues to grow, expecting to reach £120bn by 2030. Recent research has revealed that over half of UK financial companies use automated processes, such as web scraping, to gather alternative data.

Juras Juršėnas, COO at Oxylabs, said:

“To ensure that data-driven technologies and industries reach their full potential, web intelligence needs to get the right exposure and education, and to achieve this, it must be taught at universities. Doing so will allow students to thrive in the new age of big data and AI.”

“Scraping public web data is essential to the study of artificial intelligence (AI) and machine learning (ML),” said Juršėnas. “AI and ML studies are becoming very popular, with top universities offering dedicated courses. Students in these programs often lack the proper datasets to develop and train ML algorithms, and web scraping knowledge would help them build quality datasets for more efficient work.”

To fill the gap and help academics gather big data using web intelligence solutions, Oxylabs launched a pro bono initiative called “Project 4β”. The initiative aims to transfer the technological expertise accumulated over the years and grant universities and NGOs free access to data scraping tools, supporting research on big data that tackles important social missions. With this in place, universities are in the perfect position to educate their students.

Juršėnas continued:

“Despite the lack of awareness about ethical web scraping practices, the possibilities of alternative web data analysis in social, economic, or psychological studies are endless. For example, investigative journalists and political scientists use alternative data to study a wide range of issues, from tracking the influence of lobbyists by investigating visitor logs from government buildings to monitoring prohibited political ads and extremist groups in public social media platforms and forums. Automated big data gathering is the breakthrough many scientists have been waiting for, but the practice still suffers from several misconceptions.

“Web intelligence is a new industry, surrounded by various legal concerns, and it still lacks clear regulation. This is discouraging some researchers from leveraging public web data in their studies. There is nothing inherently illegal about web scraping if data is gathered ethically — it automates activities that people would otherwise do manually. For example, you can go to an e-commerce website and try to write down all product descriptions by hand, on a piece of paper. Or you can save time and gather this data automatically.

Says Juršėnas:

If supplied by a reputable scraping solutions provider, most of the risks are nullified or greatly reduced. However, organizations should always consult with legal professionals before scraping to minimize most risks associated with it.”

Together with other prominent web intelligence companies, Oxylabs introduced the Ethical Web Data Collection Initiative to promote ethical data-gathering practices and industry-wide standards. The organization aims to build trust around web scraping and educate a wider tech community about extensive data possibilities.

Juršėnas concluded:

“Web scraping has yet to gain traction in the public eye and academia. However, with the sheer volume of web data increasing exponentially, web intelligence will slowly become an inevitable part of scientific research. With it being routine to teach SPSS basics on social sciences campuses, it should become normal to familiarise students with web scraping practices.”

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Contribute

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Recent Posts

Topics

Responses