Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

How many services are supported by internet, How many services are supporte...

How many services are supported by internet? The firewall software assists at least Internet services: HTTP, FTP, Gopher, SMTP and Telnet. DNS name resolution, preferably th

Convenience of environments -artificial intelligence , Convenience of Envir...

Convenience of Environments -artificial intelligence: In some cases, some aspects of an environment which should be taken into account in decisions about actions may be unavai

What is known as multiphase clocking, What is known as multiphase clocking?...

What is known as multiphase clocking? When edge-triggered flip flops are not used, two or more clock signals may be required to guarantee proper transfer of data. This is calle

Illustrate an object model for university system, An object model for unive...

An object model for university system Establishing relationship among various classes in the system is the primary activity. Here, we have a simple model of a University System

Explain fundamental models of inter process communication, Explain the two ...

Explain the two fundamental models of inter process communication. Two kinds of message passing system are given as: (a) Direct Communication : Along with direct communicat

When page fault frequency in an operating system is reduced, Page fault fre...

Page fault frequency in an operating system is reduced when the? When locality of reference is appropriate to the process so Page fault frequency in an operating system is redu

What is drag, Move the mouse pointer over the object you wish to drag, and ...

Move the mouse pointer over the object you wish to drag, and then hold down the left mouse button. Whereas holding the mouse button, move the mouse pointer (and the object) to the

Determine the term queries-DBMS, Determine the term Queries-DBMS Querie...

Determine the term Queries-DBMS Queries most commonly allow information to be retrieved from tables. As the information is often spread across numerous tables, queries allow it

Explain non-folded network, Explain Non-Folded network Non-Folded Netw...

Explain Non-Folded network Non-Folded Network: In a switching network, every inlet/outlet connection may be utilized for inter exchange transmission. In this case, the .excha

Smugglers problem, #question.Smugglers are becoming very smart day by day. ...

#question.Smugglers are becoming very smart day by day. Now they have developed a new technique of sending their messages from one smuggler to another. In their new technology, the

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd