Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Programming languages array operations, Q. Programming languages array oper...

Q. Programming languages array operations? In programming languages array operations are written in compact form which frequently makes programs more understandable. Conside

Registers - processor, These will be independent of each other and will not...

These will be independent of each other and will not affect to each other, and so they can be fed into two different implementations units and run in parallel. The ability to remov

Information technology infrastructure, The IT infrastructure of MobTex is s...

The IT infrastructure of MobTex is simple but vital to the operation of the business. All client data, billing, stock management etc is done via a specialised application called "A

Computer graphics, what do you mean by inter leasing.how it is display the ...

what do you mean by inter leasing.how it is display the frame having 525 scan lines

Bidirectional search, Bidirectional Search: We've concentrated so far ...

Bidirectional Search: We've concentrated so far on the searches where the point of view for the search is to find a solution, but not the path to the solution. Like any other

Padovan string , A Padovan string P(n) for a natural number n is defined as...

A Padovan string P(n) for a natural number n is defined as: P(0) = ‘X’ P(1) = ‘Y’ P(2) = ‘Z’ P(n) = P(n-2) + P(n-3), n>2 where + denotes string concatenation. For a string of t

Explain region, What is a Region? A Region is a continuous area of a p...

What is a Region? A Region is a continuous area of a process's address space (like text, data and stack). The kernel in a "Region Table" that is local to the process mainta

What are the largest UDP messages, What are the largest UDP messages that c...

What are the largest UDP messages that can fit into single Ethernet frame? UDP utilizes IP for delivery. As ICMP UDP packet is encapsulated in IP datagram. Therefore entire UDP

What is bea weblogic?, Ans) BEA Web Logic is a J2EE application server and ...

Ans) BEA Web Logic is a J2EE application server and also an HTTP web server by BEA Systems of San Jose, California, for UNIX, Linux, Microsoft Windows, and other platforms. Web Log

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd