The International Press Telecommunications Council (IPTC) will use a grant from the first round of Google’s Digital News Initiative Innovation Fund to build and freely distribute an initial version of EXTRA: The EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content.
EXTRA will be a classification system for annotating news documents with high-quality subject tags. Such tags will allow publishers to deliver a variety of valuable services including content recommendations, improved advertising targeting and subject-specific content streams, such as alerts and topic pages.
“By creating a freely available rules-based classification engine, IPTC will help publishers to enhance their content with all sorts of metadata services, including enriched search, intelligent recommendations and precise analytics,” said Stuart Myles, chairman of IPTC.
EXTRA will provide news publishers with several key capabilities: the ability to automatically categorize documents by subject (for example, terrorism, sports, names of celebrities); the ability to author classification rule sets tailored to existing taxonomies; and the ability to classify documents using the industry standard IPTC Media Topics taxonomy. Taxonomies are used by many news organizations to classify their content. Classification is used in various ways, including improved online news navigation by grouping and linking, to organize editorial workflows and to enrich search.
So that EXTRA is immediately useful to the news publishing community, IPTC will create different suites of rules in two languages for classifying news documents into the IPTC Media Topics taxonomy, an industry-standard taxonomy used by several leading news providers.
“We hope that the EXTRA project will support a migration in the news publishing community towards a common industry-wide open source platform,” said Michael Steidl, managing director of IPTC. “We believe that a freely available document classification platform will provide great benefit to small-to-medium sized publishers.”
IPTC invites other parties to join the development of EXTRA.
Contact email@example.com to learn more, including how you can get involved.
Over €27m has been offered by Google to 128 projects, large and small, from 23 countries across Europe – each designed to advance innovation in the news industry. DNI is a collaboration between Google and news publishers in Europe to support high quality journalism and encourage a more sustainable news ecosystem through technology.
About IPTC: The IPTC, based in London, brings together the world’s leading news agencies, publishers and industry vendors. It develops and promotes efficient technical standards to improve the management and exchange of information between content providers, intermediaries and consumers. The standards enable easy, cost-effective and rapid innovation. Visit www.iptc.org and follow on Twitter: @IPTC