Lori MacVittie’s posterous

Random Postings on Application Delivery, Development, and SOA 

Cloudware and information privacy: TANSTAAFL

Ars Technica is reporting on a recent Pew study on cloud computing and privacy, specifically concerning remote data storage and the kind of data-mining performed on it by providers like Google, indicates that while consumers are concerned about the privacy of their data in the cloud, they still subject themselves to what many consider to be an invasion of privacy and misuse of data.

68 percent of respondents who said they'd used cloud services declared that they would be "very" concerned, and another 19 percent at least "somewhat" concerned, if their personal data were analyzed to provide targeted advertising.  This, of course, is precisely what many Web mail services, such as Google's own Gmail, do—which implies that at least some of those who profess to be "very" concerned about the practice are probably nevertheless subjecting themselves to it.

One wonders why those who profess to be very concerned about privacy and data-mining tactics used by cloudware providers would continue to use those services?

One answer might lie in the confusing legalese of the EULA (end user license agreement) presented by corporations.

It's necessary, of course, that the EULA be written using the language of the courts under which it will be enforced. But there are two problems with EULAs: first, they aren't really required to be read and second, even if they were really required to be read, they can't be easily understood by the vast majority of consumers.

I'll be the first to admit I rarely read EULAs. They're long, filled with legalese speak, and they always come down to the same basic set of rules: it's our software, we don't make any guarantees, and oh, yeah, any rights not specifically listed (like the use of the data you use with our "stuff") are reserved for us. It's that last line that's the killer, by the way because just about everything falls under that particular clause in the EULA.

Caveat emptor truly applies in the world of cloudware and online services. Buyer beware! You may be agreeing to all sorts of things you didn't intend.

Read the rest at DevCentral

 

 

Comments [0]

Why it's so hard to secure JavaScript

The discussion yesterday on JavaScript and security got me thinking about why it is that there are no good options other than script management add-ons like NoScript for securing JavaScript.

In a compiled language there may be multiple ways to write a loop, but the underlying object code generated is the same. A loop is a loop, regardless of how it's represented in the language. Security products that insert shims into the stack, run as a proxy on the server, or reside in the network can look for anomalies in that object code. This is the basis for many types of network security - IDS, IPS, AVS, intelligent firewalls. They look for anomalies in signatures and if they find one they consider it a threat.

While the execution of a loop in an interpreted language is also the same regardless of how it's represented, it looks different to security devices because it's often text-based as is the case with JavaScript and XML. There are only two good options for externally applying security to languages that are interpreted on the client: pattern matching/regex and parsing.

Pattern matching and regular expressions provide minimal value for securing client-side interpreted languages, at best, because of the incredibly high number of possible combinations of putting together code.

      Where's F5?

                    

As we learned from preventing SQL injection and XSS, attackers are easily able to avoid detection by these systems by simply adding white space, removing white space, using encoding tricks, and just generally finding a new permutation of their code.

Read the rest at DevCentral

   
Click here to download:
Why_its_so_hard_to_secure_Java.zip (3 KB)

Comments [0]

The impact of the network on ... everything

Back in the day when I was a technical architect and actually wrote code (yes, they did let me do that once) I got into a discussion with the rest of my team about the impact of our code on performance. I was saying white-space was evil because it can unnecessarily increase the number of packets necessary to transfer data. I wanted to go through the code (mostly JavaScript and HTML output) and reduce the white-space to make application response time better.

I was eventually overruled because, well, I just couldn't make the rest of the team understand the impact of our code on the network and performance and hey, one extra packet isn't going to really make a difference, is it?

Just as developers can adversely affect application performance because they don't always grok the network, network topology and connectivity rates can also affect application performance - and a whole lot more.

I stumbled across this great post on MySQL operational performance that describes this very scenario and thought it was awesome anecdotal evidence of the importance of understanding the network.

 

Read the rest at DevCentral

 

Comments [0]

A Billion More Laughs: The JavaScript hack that acts like an XML attack

Don is off in Lowell working on a project with our ARX folks so I was working late last night (finishing my daily read of the Internet) and ended up reading Scott Hanselman's discussion of threads versus processes in Chrome and IE8. It was a great read, if you like that kind of thing (I do), and it does a great job of digging into some of the RAMifications (pun intended) of the new programmatic models for both browsers.

But this isn't about processes or threads, it's about an interesting comment that caught my eye:

This will make IE8 Beta 2 unresponsive

<div id="test"></div>
.
t = document.getElementById("test");
while(true)
{
  t.innerHTML += "a";
}

What really grabbed my attention is that this little snippet of code is so eerily similar to the XML "Billion Laughs" exploit, in which an entity is expanded recursively for, well, forever and essentially causes a DoS attack on whatever system (browser, server) was attempting to parse the document.

What makes scripts like this scary is that many forums and blogs that are less vehement about disallowing HTML and script can be easily exploited by a code snippet like this, which could cause the browser of all users viewing the infected post to essentially "lock up". This is one of the reasons why IE8 and Chrome moved to a more segregated tabbed model, with each tab basically its own process rather than a thread - to prevent corruption in one from affecting others. But given the comment this doesn't seem to be the case with IE8 (there's no indication Chrome was tested with this code, so whether it handles the situation or not is still to be discovered).

 

Read the rest at DevCentral

 

Comments [0]

Damned if you do, damned if you don't

There has been much fervor around the outages of cloud computing providers of late, which seems to be leading to an increased and perhaps unwarranted emphasis on SLAs the likes of which we haven't seen since...well, the last time the IT saw outsourced anything reach the hype-level of cloud computing. Consider this snippet of goodness for a moment, and pay careful attention to the last paragraph.

From Five Key Challenges of Enterprise Cloud Computing

I won’t beat the dead “Gmail down, EC2 down, etc down” horse here. But the truth of the matter is enterprises today cannot reasonably rely on the cloud infrastructures/platforms to run their business. There’s almost no SLAs provided by the cloud providers today. Even Jeff Barr from Amazon said that AWS only provides SLA for their S3 service.

[...]

Can you imagine enterprises signing up cloud computing contracts without SLAs clearly defined? It’s like going to host their business critical infrastructure in a data center that doesn’t have clearly defined SLA.

We all know that SLAs really doesn’t buy you much. In most cases, enterprises get refunded for the amount of time that the network was down. No SLA will cover business loss. However, as one of the CSOs I met said, it’s about risk transfer. As long as there’s a defined SLA on paper, when the network/site goes down, they can go after somebody. If there’s no SLA, it will be the CIO/CSO’s head that’s on the chopping block.

Let's look at this rationally for a moment. SLAs really don't buy you much. True. True of cloud computing providers, true of the enterprise. No SLA covers business loss. True. True of cloud computing providers, true of the enterprise.

What I find amusing about this article is that the author asks if we can imagine "signing up cloud computing contracts without SLAs clearly defined?" Well, why not? Businesses do it every day when IT deploys the latest "Business App v4.5.3.2a". Microsoft Office 2007 relies heavily on on-line components, but we don't demand an SLA from Microsoft for it. Likewise, the anti-phishing capabilities of IE7 don't necessarily come with an SLA and businesses don't shy away from making it their corporate standard anyway.

In fact, I'd argue that most cloudware today comes with an anti-SLA: use at your own risk, we don't guarantee anything.

Read the rest at DevCentral

 

 

Comments [0]

Governance in the Cloud

David Linthicum of Real World SOA asks whether SOA governance should be delivered as a service, from the cloud.

Core to this proposition is the use of a registry/repository in the cloud:

This repository would provide more than just WSDL, but a complete design time and runtime SOA governance system delivered out of the cloud, perhaps linked with a local slave repository within your firewall. 

One of the problems with this, I see, is that in a SOA where governance is actively used and policies enforced, governance becomes crucial to not only the day-to-day development efforts but also to run-time execution. I like David's suggestion of a master-slave relationship, but I think it ought to be reversed. The local repository ought to be your master with the slave repository - and public access - in the cloud.

 

Read the rest at DevCentral

 

Comments [0]

Automatically detecting client speed

We used to spend a lot of cycles worrying about detecting user agents (i.e. browser) and redirecting clients to the pages written specifically for that browser. You know, back when browser incompatibility was a way of life. Yesterday.

Compatibility is still an issue, but most web developers are either using third-party JavaScript libraries to handle detection and incompatibility issues or don't use those particular features that cause problems.

One thing still seen at times, however, is the "choose high bandwidth or low bandwidth" entry pages, particularly on sites laden with streaming video and audio, whose playback is highly sensitive to the effects of jitter and thus need a fatter pipe over which to stream.

Web site designers necessarily include the "choose your speed" page because they can't reliably determine client speed. Invariably, some user on a poor connection is going to choose high bandwidth anyway, and then e-mail or call to complain about poor service. Because that's how people are.

So obviously we still have a need to detect client speed, but the code and method of doing so in the web application would be prohibitively complex and consume time and resources better spent elsewhere. But we'd still like to direct the client to the appropriate page without asking, because we're nice that way - or more likely we just want to avoid the phone call later. That would be a huge motivator for me, but I'm like that. I hate phones.  

 

Read the rest at DevCentral

 

Comments [0]

The third greatest (useful) hack in the history of the Web

Developers have an almost supernatural ability to workaround restrictions, even though some of the restrictions on building

applications delivered via the web have been akin to a kryptonite. Like Superman fighting through the debilitating effects of the imaginary mineral, they've gotten around those restrictions by coming up with ways to implement functionality and improve the behavior of browsers and thus web applications anyway.

The first greatest hack was giving HTTP state. The second? Cookie-based persistence. The third? The CNAME trick.

THE PROBLEM

The reason the "CNAME trick" came about was a limitation on browser connections to a single host imposed by

browsers, but particularly version of Internet Explorer previous to IE8. With only 2 connections per host name allowed  and many times that number of objects on a page, the ability of IE in particular but really all browsers to quickly retrieve all those objects and render them was also hampered. This resulted in the appearance that the application performed poorly, when in reality it wasn't the application but the inherent delivery mechanisms that were slow due to limitations beyond the user's, the network admin's, and the developer's control.

Users, of course, don't care about any of this. All they know is that the application they are using is slow and they want it fast. And when some of those users are corporate business users, the developers are going to hear about it because the help desk is going to call them when they get barraged with complaints from users. This is the real reason developers develop nearly supernatural powers of hacking; they'll do anything to stop users from complaining.

THE HACK

Developers all over (including ours inside F5, working on building our application acceleration solution, WebAccelerator) figured that if the browser was going to limit the number of connections to a single host that the answer was simply to trick the browser into thinking it was talking to more than one host. Turns out doing this is rather trivial: simply add multiple CNAMEs for the same host to DNS, and then reference those as the host for some of the objects in the page. So www.example.com becomes www1, www2, www3, and so on.

This required changes to the application so that the additional host names were referenced, unless you made use of a proxy-based solution like WebAccelerator and BIG-IP Local Traffic Manager capable of rewriting outbound host names and virtualizing them to appear to the outside world as if they were a single host.

 

Read the rest at DevCentral

 

 

Comments [0]

IE8: Robbing Peter to pay Paul

For those of you unfamiliar with the idiom, it should be taken to mean "benefiting one at the expense of another." frustrated_pc_user In this case, Paul is the end-user and Peter is the server administrator. Or better yet, Paul is the browser and Peter is the server.

All web browsers, including IE (Internet Explorer), impose a per-server connection limit was imposed to reduce overload on servers. This was introduced back when the web was exploding and browsers opened up connections willy-nilly and made server operators cry. Often. The limitation imposed by IE (two connections per host) was harsher than those imposed later by FireFox, which set the limit at eight.

End-users have often bemoaned the slowness of IE's rendering of web pages. This poor performance was due in part to the limitation on connections. With objects numbering many times the connection limit - and each one requiring a separate request to retrieve - requests could queue up quickly on the client, each one waiting to use one of two allowed connections.

In IE8 the limitation still exists, but it's been increased from two connections to six, to improve page download times and overall performance. 

From Nicolas Berthier: IE 8 speed improvements 

In IE8 Beta 1 we also increased our per-server connection limit from 2 to 6. What this means is that in IE7 and below pages could only download 2 elements from a given server at any one time. Increasing that limit to 6 allows sites to download 3 times as much content in parallel, which should translate into faster page download times when bandwidth is available.”

Ryan Breen in his Ajax Performance blog (an excellent resource on the subject, by the way) tested out IE8's new parallelism earlier this year and came to an obvious (at least to me, I predicted this years ago) conclusion:

start_quote_rb I suspect that my hosting provider (Dreamhost) simply can’t keep up with the dramatic increase in connection parallelism. 18 connections is simply too much of a good thing, and it will present a scaling problem for those who are on small to medium hosts. 10 users hitting at the same time will yield 180 concurrent connections, a pretty significant number for smaller providers.end_quote_rb

Given that FireFox already allows eight connections per host, and now IE users will be allowed six, that's a lot of connections per site. Especially when you consider that IE, despite FireFox's grand gains in recent years, still owns the lion share of the browser market.

Ryan is so very correct about the impact this will have on hosting providers and really the infrastructure of anyone whoserver_overload runs a web site or delivers web applications via the Internet. Especially Web 2.0 applications that make extensive use of AJAX. The impact of constantly open connections between the browser and the server from AJAX applications was noticeable, but with the increase in per-server connections in IE that impact may become even more noticeable yet.

Read the rest at DevCentral

 

 

Comments [0]

Is the URL headed for the endangered technology list?

Jeremiah Owyang, Senior Analyst, Social Computing, Forrester Research, tweeted recently on the subject of Chrome, Google's new open source browser.

Jeremiah postulates:

Chrome is a nod to the future, the address bar is really a search bar.  URLs will be an anachronism.

That's an interesting prediction, predicated on the ability of a browser translate search terms into destinations on the Internet. Farfetched? Not at all. After all, there already exists a layer of obfuscation between a URL and an Internet destination; one that translates host names into IP addresses, hiding the complexity and difficult in remembering IP

addresses from the end-user. And apparently Chrome is already well on its way to sending URLs the way of the dodo bird, otherwise we wouldn't be having this conversation.

But IP addresses, though obfuscated and hidden from view for most folks, aren't an anachronism any more than the engine of car. Its complexity, too, is hidden from view and concern for most folks. We don't need to know how the engine gets started, just that turning the key will get it started. In similar fashion, most folks don't need to know how clicking on a particular URL gets them to the right place, they just need to know to click on it.

Operating technology doesn't necessarily require understanding of how it works, and the layer of abstraction we place atop technology to make it usable by the majority doesn't necessarily make the underlying technology an anachronism, although in this case Jeremiah may be right - at least from the view point that using URLs as a navigation mechanism may become an anachronism. URLs will still be necessary, they are a part of the foundation of how the web works. But IP addresses are also necessary, and so is the technology that bridges the gap between IP addresses and host names, namely DNS.

More interesting, I think, is that Jeremiah is looking into his crystal ball and seeing the first stages of Web 3.0, where context and content is the primary vehicle that drives your journey through the web rather than a list of hyperlinks. Where SEO is king, and owning a keyword will be as important, if not more so, than brand. The move to a semantic web necessarily eliminates the importance of URLs as a visible manifestation, but not as the foundational building blocks of how that web is tied together.

Read the rest at DevCentral

Lori

 

Comments [1]