• wonderlic tests
  • EXAM REVIEW
  • NCCCO Examination
  • Summary
  • Class notes
  • QUESTIONS & ANSWERS
  • NCLEX EXAM
  • Exam (elaborations)
  • Study guide
  • Latest nclex materials
  • HESI EXAMS
  • EXAMS AND CERTIFICATIONS
  • HESI ENTRANCE EXAM
  • ATI EXAM
  • NR AND NUR Exams
  • Gizmos
  • PORTAGE LEARNING
  • Ihuman Case Study
  • LETRS
  • NURS EXAM
  • NSG Exam
  • Testbanks
  • Vsim
  • Latest WGU
  • AQA PAPERS AND MARK SCHEME
  • DMV
  • WGU EXAM
  • exam bundles
  • Study Material
  • Study Notes
  • Test Prep

1.Imagine we have two possibilities: We can scan and email the image,

Testbanks Dec 29, 2025 ★★★★★ (5.0/5)
Loading...

Loading document viewer...

Page 0 of 0

Document Text

1Introduction

1.Imagine we have two possibilities: We can scan and email the image,

or we can use an optical character reader (OCR) and send the text file.Discuss the advantage and disadvantages of the two approaches in a comparative manner. When would one be preferable over the other?The text file typically is shorter than the image file but a faxed docu- ment can also contain diagrams, pictures, etc. After using an OCR, we lose properties such as font, size, etc (unless we also recognize and transmit such information) or the personal touch if it is handwritten text. OCR may not be perfect, and for ambiguous cases, OCR should identify those image blocks and transmit them as they are. A fax ma- chine is cheaper and easier to find than a computer with scanner and OCR software.OCR is good if we have high volume, good quality documents; for doc- uments of few pages with small amount of text, it is better to transmit the image.

2.Let us say we are building an OCR and for each character, we store the bitmap of that character as a template that we match with the read character pixel by pixel. Explain when such a system would fail. Why are barcode readers still used?Such a system allows only one template per character and cannot dis- tinguish characters from multiple fonts, for example. There are stan- dardized fonts such as OCR-A and OCR-B—the fonts we typically see on the packaging of stuff we buy—which are used with OCR software (the characters in these fonts have been slightly changed to minimize the similarities between them). Barcode readers are still used because reading barcodes is still a better (cheaper, more reliable, more avail- (Introduction to Machine Learning, 4e Ethem Alpaydin) (Solution Manual, For Complete File, Download link at the end of this File) 1 / 4

21 Introduction able) technology than reading characters in arbitrary font, size, and styles.

3.Assume we are given the task of building a system to distinguish junk email. What is in a junk email that lets us know that it is junk?How can the computer detect junk through a syntactic analysis? Whatwould we like the computer to do if it detects a junk email—delete it automatically, move it to a different file, or just highlight it on the screen?Typically, text-based spam filters check for the existence/absence of words and symbols. Words such as “opportunity,” ”viagra,” ”dollars,” and characters such as ’$’ and ’!’ increase the probability that the email is spam. These probabilities are learned from a training set of exam- ple past emails that the user has previously marked as spam. We see many algorithms for this in later chapters.The spam filters do not work with 100 percent reliability and may make errors in classification. If a junk mail is not filtered, this is not good, but it is not as bad as filtering a good mail as spam. We discuss how we can take into account the relative costs of such false positives and false negatives later on.Therefore, mail messages that the system considers as spam should not be automatically deleted but kept aside so that the user can see them if he/she wants to, especially in the early stages of using the spam filter when the system has not yet been trained sufficiently.Spam filtering is probably one of the best application areas of ma- chine learning where learning systems can adapt to changes in the ways spam messages are generated.

4.Let us say we are given the task of building an automated taxi.Define the constraints. What are the inputs? What is the output? How can we communicate with the passenger? Do we need to communicate with the other automated taxis, that is, do we need a “language”?An automated taxi should be able to pick a passenger and drive him/her to a destination. It should have some positioning system (GPS/GIS) and should have other sensors (cameras) to be able to sense cars, pedes- trians, obstacles, etc on the road. The output should be the sequence of actions to reach the destination in the smallest time with minimum inconvenience to the passenger. The automated taxi needs to com- municate with the passenger to receive commands and may also need to interact with other automated taxis and possibly with a centralized 2 / 4

3 control to exhange information about road traffic or scheduling, load balancing, etc.

5.In basket analysis, we want to find the dependence between twoitems XandY. Given a database of customer transactions, how can we find these dependencies? How would we generalize this to more thantwo items?This is discussed in section 3.5 of the book.

6.In a daily newspaper, find five sample news reports for each category of politics, sports, and the arts. Go over these reports and find words that are used frequently for each category, which may help you discriminate between different categories. For example, a news report on politics is likely to include words such as “government,” “recession,”“congress,” and so forth, whereas a news report on the arts may include “album,” “canvas,” or “theater.” There are also words such as “goal” that are ambiguous.I have checked the web page for the New York Times of Feb 10th, 2010

and found the following words. For politics: republican, party, senate,

vote, administration; for sports: medal, athlete, freestyle, ski, snow-

board; for the arts: show, celebrity, debut, vocal, resonance. News

categorization systems have a preprocessing stage to handle suffixes, such as votes vs vote, or snowboarding vs snowboard. Note that sports is a metaphor used in politics and many ways of life that involve com- petition, so the use of few keywords is tricky and one needs to take context into account by employing hundreds/thousands of keywords.In a class of students, it would be interesting to see the overlap of the words students find.

7.If a face image is a100×100image, written in row-major, this is a 10,000-dimensional vector. If we shift the image one pixel to the right, this will be a very different vector in the 10,000-dimensionalspace. How can we build face recognizers robust to such distortions?Face recognition systems typically have a preprocessing stage for nor- malization where the input is centered and possibly resized before recognition. This is generally done by first finding the eyes and then translating the image accordingly. There are also recognizers that do not use the face image as pixels but rather extract structural features from the image, for example, the ratio of the distance between the two 3 / 4

41 Introduction eyes to the size of the whole face. Such features would be invariant to translations and size changes.

8.Take, for example, the word “machine.” Write it ten times. Also ask a friend to write it ten times. Analyzing these twenty images,try to find features, types of strokes, curvatures, loops, how you makethe dots, and so on, that discriminate your handwriting from that of your friend’s.

I leave this to the reader. One personal comment: Over the years,

I have noticed that though many western nations use some version of the roman alphabet, there are noticable distinctions between their writing styles. I notice for example that french and italian handwrit- ings are more cursive, connected, and flowery, whereas german, british and american handwritten characters are stroke-based, formed of sep- arate characters. This may just be my impression and in a class formed of students from different nationalities, this may be an interesting thing to check.

9.In estimating the price of a used car, it makes more sense to estimate the percent depreciation over the original price than to estimate the absolute price. Why?Depreciation is generally measured as some percentage of the original price. Cars having different prices may suffer the same loss in the same amount of time or mileage; for example in Turkey, the general belief is that a car, on the average, loses 10-20 per cent of its valuein its first year.

10.Consider your purchases from your local supermarket. What are the typical associations between the products you buy? How do they depend on the season? Try to devise general rules that explain your purchasing behavior.I buy bread regularly irrespective of the season. When I buy beer, I also buy crips. I buy ice cream in summer and I buy more tea in winter than in summer.

11.List all the companies or institutions that have some data about you and what they know about you. Check their policies for data security and privacy.My employer has my personal information. I’m very careful submitting my personal information over the web, and never share any with any

  • / 4

User Reviews

★★★★★ (5.0/5 based on 1 reviews)
Login to Review
S
Student
May 21, 2025
★★★★★

This document featured practical examples that helped me ace my presentation. Such an outstanding resource!

Download Document

Buy This Document

$1.00 One-time purchase
Buy Now
  • Full access to this document
  • Download anytime
  • No expiration

Document Information

Category: Testbanks
Added: Dec 29, 2025
Description:

1Introduction 1.Imagine we have two possibilities: We can scan and email the image, or we can use an optical character reader (OCR) and send the text file. Discuss the advantage and disadvantages ...

Unlock Now
$ 1.00