Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What are the graphics adapters, Graphics adapters: Video card converts...

Graphics adapters: Video card converts digital output from computer into an analog video signal and transmits the signal through a cable to monitor also known as a graphics ca

Explain the application of e-commerce in home banking, Explain the applicat...

Explain the application of E-Commerce in Home Banking. Home Banking: E-commerce is employed in Home Banking like one call or one click. Internet banking or online bank

Qualitative independent variable, Let Consider a multiple regression model ...

Let Consider a multiple regression model for a response y, with one quantitative idividually variable x1, and one qualitative variable at three levels. a)    Write a first-order m

TIME COMPLEXITY, calculate the time complexity of a=(b/c) operation in stac...

calculate the time complexity of a=(b/c) operation in stack

What is trivial file transfer protocol, What is trivial file transfer proto...

What is trivial file transfer protocol? Explain briefly? Trivial File Transfer Protocol (i.e. TFTP) is helpful for bootstrapping a hardware device which does not have a disk on

Explains the various levels of parallel processing, Levels of parallel proc...

Levels of parallel processing We could have parallel processing at four levels. i)  Instruction Level: Most processors have numerous execution units and can execute numero

Communications between the user and the server, Communications between the ...

Communications between the user and the server A significant enhancement was achieved when communications between the user and the server was sent in encrypted form and later

What are the different scheduling policies in linux, What are the different...

What are the different scheduling policies in Linux The Linux scheduler has three different scheduling policies: one for 'normal'Processes, and two for 'real time' processes

Explain point-to-point message passing, Q. Explain Point-to-point Message P...

Q. Explain Point-to-point Message Passing? In point to point message passing, one process transmits/receives message from/to another process. There are four communication modes

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd