Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Information

Information

Published by ิีbuathip, 2018-05-04 03:49:34

Description: Information

Keywords: information

Search

Read the Text Version

IST 511 Information Management: Information and Technology Introduction to IST 511 Dr. C. Lee Giles David Reese Professor, College of Information Sciences and Technology Professor of Computer Science and Engineering Professor of Supply Chain and Information Systems The Pennsylvania State University, University Park, PA, USA [email protected] http://clgiles.ist.psu.edu

What is IST 511?• Introduction to algorithmic/computational parts of IST – There will be some maths• Guide to research – In information and related sciences – In IST – Illustrate the intellectual diversity of IST• Methodology – Read, view, discuss and write about ideas and papers in the field • When possible, use examples of IST 511 research from IST grad students – Write a research proposal paper and give a professional presentation • Focus on methodologies discussed here

IST 511• Nearly all course material is at:http://clgiles.ist.psu.edu/IST511Lose this address, put IST511 into Google or BingRead this page and links very carefully at least once a week• Angel is used so far only for student submissions.• Important notices will be sent by email with the subject:IST511

Today• What is information – Things - artifacts – Use • Personal, social,etc. – Foundations and representation – Information vs knowledge• Information science vs informatics vs information theory

TomorrowTopics considered and used in IST (will consider some, not all)• Complexity• Representation• AI• Machine learning• Information retrieval and search• Text• Encryption• Social networks• Probabilistic reasoning• Digital libraries• Others?

Theories in Information Sciences• Enumerate some of these theories in this course.• Issues: – Unified theory? – Domain of applicability – Conflicts• Theories here are mostly algorithmic – Automated vs manual – Scalable features • Google vs iPhone• Quality of theories – Occam’s razor – Subsumption of other theories

Past & Recent Headlines• A Minnesota hacker was sentenced to 18 years in prison on Tuesday for using his neighbors’ wireless network without permission and then framing them for child pornography distribution and email threats against Vice President Joe Biden and other officials.• “Latest Genealogy Tools Create a Need to Know”• “Bots Hammer Estonia In Cyber Vendetta”• “UPS slashed the time it takes to determine the least-expensive route from months and wants to make that information available in real time”• “Sophisticated internet users continue to fall for spam”• “Google makes us stupid”• “Google makes us smarter”• “IT doesn’t matter”• “Microsoft and Yahoo unite against Google Book Search”

What is Information?• There are several ways to define “information” – Subjective: People develop models of their environment. Information created by people makes those models more accurate. – Thing/artifact: Information is what’s captured in a book, web page, or other resource. • More information is digital

Information - wikipedia• Information as a concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.• Many people speak about the Information Age as the advent of the Knowledge Age or knowledge society, the information society, the Information revolution, and information technologies, and even though informatics, information science and computer science are often in the spotlight, the word \"information\" is often used without careful consideration of the various meanings it has acquired.

How much information is there in the worldInformetrics - the measurement of information• Stored – What can we store – What do we intend to store. – What is stored.• How do we use it – Decision making

Information Age• We have entered the information age – What is the information age?• When do we leave it and where do we go next? – David Weinberger’s Too Big to Know – What information was

Digitization of Everything: the Zettabytes are coming• Soon most everything will be recorded and indexed• Much will remain local• Most bytes will never be seen by humans.• Search, data summarization, trend detection, information and knowledge extraction and discovery are key technologies• So will be infrastructure to manage this.

Digital InformationCreated, Captured, Replicated WorldwideExabytes1,8001,600 10-fold DVD RFID1,400 Growth in 51,200 Years! Digital TV MP3 players1,000 Digital cameras800 Camera phones, VoIP Medical imaging, Laptops,600 Data center applications, Games400 Satellite images, GPS, ATMs, Scanners Sensors, Digital radio, DLP theaters, Telematics200 Peer-to-peer, Email, Instant messaging, Videoconferencing, CAD/CAM, Toys, Industrial machines, Security systems, Appliances02006 2007 2008 2009 2010 2011Source: IDC, 2008

Scale of things to come• Information growth: – In 2002, recorded media and electronic information flows generated about 22 exabytes EB (1018) of information – In 2006, the amount of digital information created, captured, and replicated was 161 EB – In 2010, the amount of information added annually to the digital universe was about 988 EB (almost 1 ZB)• How much of this is information, data or knowledge?

Digital Universe Environmental Footprint• In our physical universe, 98.5% of the known mass is invisible, composed of interstellar dust or what scientists call “dark matter.” In the digital universe, we have our own form of dark matter — the tiny signals from sensors and RFID tags and the voice packets that make up less than 6% of the digital universe by gigabyte, but account for more than 99% of the “units,” information “containers,” or “files” in it.• Tenfold growth of the digital universe in five years will have a measurable impact on the environment, in terms of both power consumed and electronic waste.

How much information is there? Yotta• Soon most everything will be Everything Zetta recorded and indexed Exa !• Most bytes will never be seen by humans. Recorded All Books• Data summarization, MultiMediatrend detection All books Petaanomaly detection (words) Teraare key technologiesSee Mike Lesk: .Movi Giga How much information is there: e http://www.lesk.com/mlesk/ksg97/ksg.htmlSee Lyman & Varian: How much information A Photo Megahttp://www.sims.berkeley.edu/research/projects/how-much-info/ A Book Kilo24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli



Information FactsPrint, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.• How big is five exabytes? If digitized with full formatting, the seventeen million books in the Library of Congress contain about 136 terabytes of information; five exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections.• Hard disks store most new information. Ninety-two percent of new information is stored on magnetic media, primarily hard disks. Film represents 7% of the total, paper 0.01%, and optical media 0.002%.• The United States produces about 40% of the world's new stored information, including 33% of the world's new printed information, 30% of the world's new film titles, 40% of the world's information stored on optical media, and about 50% of the information stored on magnetic media.• How much new information per person? According to the Population Reference Bureau, the world population is 6.3 billion, thus almost 800 MB of recorded information is produced per person each year. It would take about 30 feet of books to store the equivalent of 800 MB of information on paper.

Information CensusLesk EBVarian & Lyman PB• ~10 Exabytes• ~90% digital TB• > 55% personal• Print: .003% of bytes5TB/y, but text has lowest entropy• Email is (10 Bmpd) 4PB/y and is 20% text (estimate by GMraye)dia TB/y Growth Rate, %• WWW is ~50TB deep web ~50 PB optical 50 70• Growth: 50%/y paper 100 2 film 100,000 4 magnetic 1,000,000 55 1,100,150 50 total

First Disk 1956• IBM 305 RAMAC• 4 MB• 50x24” disks• 1200 rpm• 100 ms access• 35k$/y rent• Included computer & accounting software (tubes not transistors)

1.6 meters 10 years later 30 MB

Now - Terabytes on your desk Terabyte external drive for $200 - 20 cents a gigabyte. In 5 years, 1 cent/gigabyte, $10 for a terabyte?

Now - Terabytes on your deskTerabyte external drive for$200 - 6 cents a gigabyte.In 5 years, 1 cent/gigabyte, $10 for aterabyte?

Moore's Law• Defined by Dr. Gordon Moore during the sixties.• Predicts an exponential increase in component density over time, with a doubling time of 18 months.• Applicable to microprocessors, DRAMs , DSPs and other microelectronics.• Monotonic increase in density observed since the 1960s.

Moore’s Law - Density

Disk TB Shipped per Year 1E+7 1998 Disk Trend (Jim Porter) ht t p :/ / www.d iskt rend .co m/ p d f / p o rt rp kg .p d f . Storage capacity ExaBytebeating Moore’s law 1E+6 disk TB growth: 112%/y 1E+5• Improvements: Moore's Law: 1E+4 58.7%/yCapacity 60%/y 1E+3 1988Bandwidth 40%/y 1991 1994 1997 2000Access time 16%/y• 1000 $/TB today• 100 $/TB in 2007Moores law 58.70% /yearTB growth 112.30% /year since 1993Price decline 50.70% /year since 1993Most (80%) data is personal (not enterprise)This will likely remain true.

Digital Immortality Bell, Gray, CACM, ‘01Requirements for storing various media for a single person’s lifetime at modest fidelity

What is Digital Immortality?• Preservation and interaction of digitized experiences for individuals and/or groups – Preservation and access – Active interaction with archives through queries and/or an avatar (agents) – Avatar interactions for group experiences• Issues: – Archiving – Indexing – Veracity – Access

New Information Flows• Telephone increase is significant

Internet

All the world’s libraries onyour iPod! iPhoneNY Times MagazineAnd you thought finding thatsong was hard. •Storage is practically free •Much is mobile •Access is crucial •Moore’s law keeps on trucking

Why Put Everything in Cyberspace?Low rent Immediate OR Time Delayed Point-to-Point min $/byte OR BroadcastShrinks time now or later Locate ProcessShrinks space Analyze here or there SummarizeAutomate processing knowbots

Memex As We May Think, Vannevar Bush, 1945“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”“yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely”

Trying to fill a terabyte in a yearItem Items/TB Items/day300 KB JPEG 3 M 9,8001 MB Doc 1 M 2,9001 hour 256 kb/s 9K 26 MP3 audio 290 0.81 hour 1.5 Mbp/s MPEG video

Progress of Science Paradigms• Thousand years ago:science was empirical describing natural phenomena• Last few hundred years:theoretical branch using models, generalizations  .  2  4G  c2   3 a2• Last few decades: a a computational branch a simulating complex phenomena• Today:data and information exploration (eScience)unify theory, experiment, and simulation - information driven– Data captured by sensors, instruments or generated by simulator– Processed by software– Information/Knowledge stored in computer– Scientist analyzes database / files using data management and statistics– Network Science– Cyberinfrastructure

Information Systems• An Information System is the system of persons, data records and activities that process the data and information in a given organization, including manual processes or automated processes. – Usually the term is used erroneously as a synonymous for computer- based information systems, which is only the Information technologies component of an Information System. – The computer-based information systems are the field of study for Information technologies (IT); however these should hardly be treated apart from the bigger Information System that is always involved in.• The actual system such as a search engine, etc.

The Information Funnel Information is nearly always developed to facilitate human needs!•• Complexity of the World Capture Representation Apply

Representation as Information: What Makes a Good Representation?• • A straight line can be a good representation for describing some data. • For other data, a curved (quadratic) line is better.

Types of Representations• Categories• Equations• Language• Logic statements• Images• Mental models

Models(information) of Processes • Square-wave process Modeled by sine wave

Information Processing• There are many ways to apply the information stored in representations.• Retrieval – Finding useful information• Recognition – Identifying an instance• Inference – Extend stored information to a new situation

Context• One of the hardest problems for information processing is determining the context in which the information is applied.• This may lead to incorrect inferences.• Some say information is data in context.

People and Information• People process information based on their experience and context.• Human information processing is affected by emotions and needs.• Your data may be my information

What is an information system?• Processes information• Requires knowledge of what information is• How much information is available – Static vs dynamic – Explict vs implicit• How it is used and structured – information management• How it’s managed• Incorporated into personal or social use.

Information Characteristics• Structural / Ontological / context – State based• Representations / rules• Functional / active• Language / communication• Personal• Social

What is knowledge?• Data - Facts, observations, or perceptions.• Information - Subset of data, only including those data that possess context, relevance, and purpose.• Knowledge - A more simplistic view considers knowledge as being at the highest level in a hierarchy with data (at the lowest level) and information (at the middle level). •Data refers to bare facts void of context. –A telephone number. •Information is data in context. –A phone book. •Knowledge is information that facilitates action. –Recognizing that a phone number belongs to a good client, who needs to be called once per week to get his orders.

From Facts to Wisdom (Haeckel & Nolan, 1993) one example of the hierarchyVolume Less is ValueCompleteness More StructureObjectivity Subjectivity Wisdom Knowledge Intelligence Information Facts

What is knowledge?• Knowledge - A more complex view considers knowledge as intrinsically different from information. Instead of considering knowledge as richer or more detailed set of facts, we define knowledge in an area as justified beliefs about relationships among concepts relevant to that particular area.

Is Information• An aspect of intelligence? – Derivative to its use• An aspect of life?• Innate to physical reality? – Innate code, ex DNA, etc.

Characteristics of Information– Invariant– Dynamic– Personal– Situational– Cultural– An act versus a fact– Additive– Symbolic– Others?


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook