Data

A major problem in modern neural methods is the availability of data sources. For high-performance neural systems, large corpora are required; but most of the time these corpora are found by scraping websites or asking companies politely.

Exercise: What is the effect on science reproducibility if the data set is not liberally licensed?

Exercise: give an example of a proprietary data source; what could you do with that data if it were liberally licensed?

OpenStreetMap is one of the largest and most successful open data sources in the world; they use the Open Data License.

Read the Wikipedia article on OpenStreetMap.