CATH database

What is CATH?

The CATH Protein Structure Classification is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and colleagues,[1] and continues to be developed by the Orengo group at University College London. CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.

CATH
Content
Description Protein Structure Classification
Contact
Research center University College London
Laboratory Institute of Structural and Molecular Biology
Primary citation Sillitoe et al. (2015) [2]
Release date 1997
Access
Website http://www.cathdb.info/
Download URL http://www.cathdb.info/download
Miscellaneous
Data release
frequency
CATH-B is released daily. Official releases are approximately annual.
Version 4.1

How is CATH created?

Experimentally-determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains, where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation. The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution, i.e. they are homologous.

The four main levels of the CATH hierarchy:
# Level Description
1 Class the overall secondary-structure content of the domain. (Equivalent to the SCOP Class)
2 Architecture high structural similarity but no evidence of homology. (Equivalent to the 'fold' level in SCOP)
3 Topology/fold a large-scale grouping of topologies which share particular structural features
4 Homologous superfamily indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily)

Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments.

CATH releases

The CATH team aim to provide official releases of the CATH classification every 12 months. This release process is important because is allows for the provision of internal validation, extra annotations and analysis. However, it can mean that there is a time delay between new structures appearing in the PDB and the latest official CATH release,

In order to address this issue: CATH-B provides a limited amount of information to the very latest domain annotations (e.g. domain boundaries and superfamily classifications).

The latest release of CATH-Gene3D (v4.1) was released in July 2016 and consists of:

Open source software

CATH is proud to be a member of the open source software community. Their developers use and contribute towards the development and maintenance of a number of open source tools. For a full list of the open source software used in the making of this resource (both in the pipeline and web pages), please visit the CATH tools page.

CATH TODO (on GitHub)

This project exists to allow external users (that's you) to create and keep track of issues relating to the CATH protein structure classification, e.g.:

Anyone and everyone are encouraged to add issues - no matter how trivial or complex they may seem. Perhaps you have a query because some part of the documentation is unclear, perhaps you have a wishlist for a feature you would love to see, or a problem/bug encountered on the web pages (hopefully not). Either way: all feedback is very welcome.

A timeline cannot be guaranteed in which feature requests will be implemented, however the level of community interest in any particular issue will definitely be taken into account as the work is prioritised.

Notes:

Tutorials and FAQs

For more information on how to search the CATH resource, there are tutorials that provide help with this. Answers to the most frequently asked questions can be found here.

Social media

Automated tweets are posted weekly to let users know the statistics from the last week regarding the number of new protein domains identified and the number of newly classified protein domains, i.e. domains with a 'CATH' code. The Orengo group also tweet information on new publications and other interesting things going on.

References

  1. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997). "CATH--a hierarchic classification of protein domain structures". Structure. 5 (8): 1093–1108. doi:10.1016/S0969-2126(97)00260-8. PMID 9309224.
  2. Sillitoe, Ian; Lewis, Tony E.; Cuff, Alison; Das, Sayoni; Ashford, Paul; Dawson, Natalie L.; Furnham, Nicholas; Laskowski, Roman; Lee, David; Lees, Jonathan G.; Lehtinen, Sonja; Studer, Romain A.; Thornton, Janet; Orengo, Christine A. (2015). "CATH: comprehensive structural and functional annotations for genome sequences". Nucleic Acids Research. 43: D376–D381. doi:10.1093/nar/gku947.

External links

This article is issued from Wikipedia - version of the 11/10/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.