Task Hierarchy 138K
Sebastin Santy
Rishabh Mehrotra
Emine Yilmaz
[Web Demo]
[Dataset]


Check the interactive version of the above hierarchy here


Description

Wikihow provides task information, so we use it as a source of task information. We use Wikihow for building the external hierarchical ontology as it is one of the most data-rich task-solving online community. Each article in Wikihow can be interpreted effectively as a task, hence a ontology grouping all of them would be great in order to build the hierarchy. We notice that each article in Wikihow belongs to a specific category ladder. Each node in the newly constructed hierarchy represents a category, with its children nodes representing its subcategories. We construct and release this hierarchy keeping reproducibility in mind so that it can be used as a baseline for for future tasks-based work.


Examples

Article Category Tree
Pitch a Baseball Sports and Fitness > Team Sports > Baseball > Pitching
Clean a Coffee Maker Food and Entertaining > Drinks > Coffee
Motivate Your Child Family Life > Parenting > Behavioral Issues
See the White House Travel > Destinations > North America Travel > United States Travel > Washington D.C. Travel



Statistics

Maximum Depth: 8
No. of Nodes (Categories): 4657
No. of Leaf Nodes: 3461
No. of Articles present in the hierarchy: 138092



Download Material

TaskHierarchy [35 MB]
TaskHierarchy in JSON Format. Contains wikiHow category hierarchy along with the article titles and urls.
Articles [1.35GB]
Articles scraped in HTML Format. Only body of the article is present, javascript and style tags are removed.
Code
Script for parsing different sections of the articles like article statistics, task steps, information section etc.



Disclaimer

This dataset is includes information from "WikiHow" website. Use of this data must respect https://www.wikihow.com/wikiHow:Terms-of-Use. The data is licensed under CC BY 3.0 by WikiHow.

We do not own the copyright of any of the data used in building the hierarchy. We solely provide the hierarchy for researchers and educators who wish to use the dataset for non-commercial research and/or educational purposes.