Posted 1 Apr, 2013 by dave@loggly.com
in Business, Log Management, and Startup
A few days ago an editor from CIO.com asked Loggly to provide two tips on hiring/retaining engineering talent for a future article. “Sure -- happy to help”, I replied. I then mentally outlined how we build and keep our highly sought after development team happy (out of 24 employees 25 employees, another just signed, 22 of our hires write code). The standard Silicon Valley startup checklist popped up:
- Competitive salary, equity packages, etc
- Open vacation, flex hours, etc
- Catered lunches, snack bars, and top-shelf beverages
- Latest-greatest technology and development environments
But to really attract, build and keep top talent you need more. You need the ability to:
- Work on a product that thrives on big data for a formidable challenge ($100b market, per IDC)
- Solve a true market created big problem, that matters... and one that people pay for
- Make a difference, and be recognized for it.
The last line was the true angle to build off and two examples quickly came to mind: how we target our hires and welcome them, and how we keep engineers aligned and motivated on the problem their code will solve, essentially the direct connection to our customer's log management challenges.
I’ll skip the tip I sent over on our on-boarding process, maybe I shouldn’t ever share it ;) but the second tip was about sending our developers to work a half-day in at the Loggly booth at a tradeshow. After reviewing our article suggestions, I heard back from the CIO.com editor:
"Thanks for the tips... Unfortunately, they weren't quite right for the article.
(Every IT guy I have ever known has DREADED being asked to man the booth or attend trade shows and conferences)."
I was shocked... and then I wasn’t. It only confirmed what I’ve already known, we are not our father's IT company. We aren’t:
- Building an old-school IT product (legacy deployment cycles aimed at non-cloud savvy companies run by stiff CIOs)
- Selling an old-school IT product (hundreds of end-users who are also our buyers sign up WEEKLY)
- Selling into the CIO and expecting our technology to be forced down (vs adopted, loved and shared)
- Looking, hiring or retaining old school ‘Dilbertized IT guy' engineering talent who hides at the mere thought of customer interaction
So why would I expect CIO.com to immediately see the value of the tips? I shouldn’t. At Loggly, we are different by design. We've developed a 100% cloud-based service loved and used by over 2,500 of the industries leading cloud-centric brands (AirBnb, Adroll, Sony, EA, GrubHub, Heroku) and we have a fun mascot named Hoover people love and want to interact with (here's Loggly's Hoover vs. the @Spotify shark at #Pycon).
This translates directly into my role driving Marketing and Revenue where I look for innovative developers and SysOps people (guys and gals) who are looking to simplify log management (we can help you!!
Free 30 day trial of Loggly), and it flows directly into who we hire, how we retain talent and how we motivate our teams.
Heading into
PyCon, I had the envious Marketing challenge of selecting booth staff after
getting numerous emails, Skypes and SMS from engineers asking if they could attend and work the booth, a great problem to have. As it turned out, our CTO and two lead developers attended for a half-day and it served to energize them about:
- The billion dollar opportunity we are solving in log management (the booth was busy!)
- The ability to hear from customers and prospects exactly what they want in a product and how they are using it (the good and the bad)
- The way they can make a difference with their efforts to improve and continually optimize the product (our customers enjoyed the face-to-face conversations)
Our engineers arrived at back at work on Monday really stoked to share their booth experience, their conversations, and on how they can make the next killer version of cloud-based log management to solve the challenges of our next 2,500+ customers.
So while the tip wasn’t good enough for CIO.com, I’ll take that as badge of honor. We are doing something different,
things are going great and people are excited to play a role.
If you aren't the luddite "IT-guy" and a person looking for a change, let's chat! Or please send a peer our way.
Cheers!
No Comments
| Leave a Comment
|
Posted 12 Mar, 2013 by Philip O'Toole
in Business and Log Management
Imagine this. It's the night before Thanksgiving, you're having dinner with your wife and in-laws at a nice restaurant. Your cellphone rings. It's the Senior Director of Engineering, your boss, responsible for the new Cloud Portal you and your team brought up a few weeks back. This can't be good. And it happened to me, at a big well-respected engineering focused company. #truestory
It turns out a key customer had decided on this night, of all nights, to place a large order for licenses, but our Portal is returning HTTP 500. "What could be happening?", he pleads. "Hmmm, can we get the logs?" I reply. "We can't", he tells me, "Operations never set up systems to take them off the box." Sigh... Sound Familar?
This happens all the time within IT organizations. It turns out that one of the worst places logs can be stored is on the machine generating those logs. What if that machine contains Insider Information, and access to it is restricted? What if the network link to it is down? What if all you can do is think really, really hard to work out why that key customer can't place that order? But it doesn't have to be this way.
If you love your logs, set them free. Get them off the host machines. You, your boss, and your customers will thank you when you can diagnose issues from anywhere, and in real-time. Get them into Loggly, and let us do the hard work for you. Best part, you can do it all without on-premise software, local agents, or disrupting production machines.
Anyway, my boss and I solved the issue, and the order was placed, but it took until the taxi ride home. Life Lesson? Enjoy Thanksgiving and other holidays stress-free... learn from others so this doesn't happen to you.
No Comments
| Leave a Comment
|
Posted 14 Feb, 2013 by dave@loggly.com
in Business, Code, and Log Management
Last Friday, The New York Times reporter John Broder published a less than rosy picture of highway trip between Washington D.C. and Boston, cruising in Tesla’s Model S luxury sedan. The purpose of the trip was to range test the car between two new supercharging stations with a "speedy road trip". Broder wrote about his anxiety-ridden stretches between charging stations, when energy consumption was outpacing mileage. Among other complaints, he was unhappy about the need to power off the heat on a cold Northeastern day and to drive slowly, in an effort to conserve battery power. The trip ended not at the charging station, but on the back of a tow truck having run out of electricity―a result he visually documented with a feature photo of the car being towed. Ouch.
This wasn’t the PR outcome that Tesla Motors chief executive Elon Musk expected. Avoiding a factless he-said-she-said on Twitter, he gave an interview saying that he was planning to publish the log files of the reporter's vehicle, since they were at odds with details in the story. As reported in Venture Beat: “Musk claims Broder failed to mention how much he was punching the accelerator early in the ride, a move that Tesla warns its customers will drain the battery faster. Also, Musk says Broder took a detour through Manhattan. And he didn’t fully charge the car before departing.”
While the logs have yet to be published, if they are and if they do support Musk’s claims, it will be advantageous for Tesla -- and the “logging” community at large; bringing light to the next big category of business intelligence. "Your honor, I'd like to call the #LogFile as my next witness." Amusing, but true and will be happening more and more often inside and outside of business.
As this story demonstrates, software is behind everything these days. Massive machine data is being generated every minute not only from our computers and cell phones, but from the cars we drive, the appliances we use, and virtually anything with a chip, motor or battery run with an operating system. The golden nuggets of truth exist within the log files, holding the detailed data about an event that happened, one that cannot be disputed.
Such data, when mined and organized properly, provides a wealth of indicators to solve problems, such as why a web transaction timed out, the difference between a server being “on” and actually performing as intended, or to defend the veracity of a product’s claims. The tremendous value for IT departments (sysops, techops, devops and product developers) in having real-time access to log data for feedback on product performance is limitless. Log files can show where users struggled or took too long to accomplish tasks, or where your applications or hardware let them down. In trial-by-pubic cases like these log file data shines in a new light, far outside of the walls of technology but in the vernacular of the general public.
Logs don’t lie, but the truth can’t come out unless there are affordable and easy ways to release insights from these enormous log databases within the window of time in which they matter. Moving the conversation forward, suppose Tesla was able to collect and analyze log files on all of its vehicles and then receive alerts on issues to determine if they need addressing or if they are simply isolated events caused by user error, right back to the driver before the situation arose. That's the end game: taking aggregate user data to find nuggets of wisdom that can then be fed back to the end user with guidance -- seamlessly.
That's where we come in. Loggly’s cloud-based log management service gives companies fast, centralized access to all of their log data, so they can solve issues, identify problems and make customers happy again understanding and answering "is this needle in the haystack or tip of the iceberg". In the court of public opinion and social everything or in the case of 100% cloud driven buisness... anything less than real-time is becoming really-late.
When consumer product CEOs start talking about logs, that's a pretty good sign that log files and log analytics are not just a stream of text and data to throw on the backup server every night―if you didn’t run out of storage for them already. Smart log file mining helps companies keep customers happy and their bottom line bigger. If you run a data-driven business, the more your company can act on that data to improve application/service/product performance and experience, the better off your customers will bef.
If Elon Musk is driving the future of Tesla, an automobile brand off his log files, shouldn't your cloud application-driven business also being harnessing the full power of log intelligence?
No Comments
| Leave a Comment
|
Posted 15 Mar, 2012 by Brian Schroeder
in Business and Startup

Writing software is hard. Writing it well is even harder. I have been rolling some ideas around in my head on how to make software that is better for the people creating it and the people using it. Software has some distinct (and not-so-distinct) phases that it goes through as you work on it. The first of those phases for developers is the design phase. Here are some ideas on making better software during the design process.
If you start with "you know what would be cool?", kill the idea.
OK, so maybe that's a little harsh, but that's my knee jerk reaction when I hear a "you know what would be cool" idea. Let me rephrase my suggestion; stop and think. You may be wondering what I have against doing cool things. Nothing. There's nothing wrong with doing cool things. The problem is that we developers gravitate toward using a cool piece of technology for the sake of using it rather than finding the best fit for our problem. We often attempt to solve a problem either using a cool new piece of technology in a way that it wasn't intended for, or new language features when an alternative would have been better suited. Redis, mongodb, rabbitmq, Python, Ruby, Clojure, Scala, Node.js are all great technologies that meet various needs in their own ways. Sometimes they may be the best fit for your problem, but not always. Just think twice when you think you're about to do something "cool". Developing software is less about being cool and more about building something that works well.
Aim for simplicity.
Software developers have a knack for building and stacking and inheriting and composing until the original simple solution is unrecognizable in a mound of clever engineering. When you're thinking about the next piece of code that you're going to write, start simple. Don't build something in your mind that is more complex than it has to be. You should always use the simplest thing that works, except for when you shouldn't. Next point.
The simplest solution isn't always best.
Sometimes the simplest thing just won't cut it. Sometimes you need that really weird class hierarchy or a really tricky algorithm or something that really hurts your brain to even think about. Usually this is going to happen on the second iteration of a problem. You should still always start with the simplest thing. If that doesn't seem to hold up, go on to something more complicated, but keep the complexity to a minimum.
Don't build more than you need.
It's often tempting to try to build a solution that handles every contingency or meets the needs a user doesn't even know what they have yet. When you are tempted to design everything into your application, think twice. Don't design what you don't need. If you can come up with a good reason for designing it in, then go for it. Otherwise, don't bother. Build just what you need and no more.
Think long-term.
I've seen so many developers (including myself) focus on getting something working now and neglect foreseeable issues with maintenance and deployment (which are the two major areas that this impacts). When you are thinking about the next piece of code that you are going to write, take the time to make sure you are focused on the long term solution. This may sound like it contradicts my previous idea of doing the simplest thing possible, but I think these ideas can co-exist. Start simple. Get something working. But don't plan on getting it "justworking"; plan on getting it working so that the next generation of developers that work on it will be happy you made the decisions you did rather than having them curse your name.
These aren't new thoughts. I've heard them stated by other experienced developers in various ways. And they're not immutable laws of nature. I think they're good guidelines, but the world won't come crashing down on your ears if you do something different. But I do believe that your experience of designing software will benefit from it. And as a result, your software will be better for you and your customer.
2 Comments
| Leave a Comment
|
Posted 23 Feb, 2012 by inga weizman
in Business and Startup

We’ve all dreamed of doing our job while sipping a margarita on the beach, Loggly and Scalr can get you there...well almost. We love things that scale and automate and all things cloud and that is why we partnered with Scalr. Scalr is an open source cloud management tool that brings automation to web applications. The cloud is all about scalability, growth, and making things easier to manage. Scarl gives you the ability to manage as many servers as you need on different cloud computing services and adjusts load capacity as you have spikes and valleys in your traffic. It’s like a system’s admin machine that never sleeps, doesn’t need energy drinks and coffee, so now your system’s admin can actually can have a life. Loggly comes in and takes care of all the logs, we store them, make them searchable, as well as providing features such as monitoring, troubleshooting and user analytics so nothing is lost and you can figure out what your users are doing. Sit back, relax and see what Scalr can do for you.
- DNS Management: automatically created and updated for you
- Fault tolerance: servers crash, at the worst times, now the problem is detected and automatically resolved for you
- Multi-cloud deployments: no need to commit to just one, infrastructure can be spread out across multiple providers
- Integrated SSH and key management
We're super excited to be partnered with such a cool company that's helping web apps scale and making your life easier.
No Comments
| Leave a Comment
|
Posted 7 Feb, 2012 by Kord Campbell
in Business and Startup
I recently came across an eWeek article titled 2012: A Cloudy Year for Big Data by Frank Ohlhorst. You could easily say I have a few opinions on the matter of big data! :)
First, I agree with Frank’s first notion that big data is neither big or new. The fact is, I've been saying things like "Dude, that's a ton of data!" since I started notching out the opposite sides of floppies back in the 80s. Remember these?

Ohlhprst quickly follows up his vague handwaving that ‘big data’ a new term with, “For most of its existence, big data has been out of the reach of small and midsize businesses (SMBs) because the storage and processing power needed to make this technology work is too expensive.”
Companies have been doing for years what they need to do better business, regardless of whether or not it’s expensive. In manufacturing, the costs of a small company optimizing on how to efficiently making tons of a cheap product can actually be quite a bit more expensive than a larger company making a few units of a complex product. In the same vein, smaller business may have more complex business optimization processes than larger ones, and require relatively larger amounts of data are required to solve those problems than with larger companies.
I agree with Frank that small business typically don't always have the resources necessary to solve massive scale problems, but again the problems are relative. For example, small software startups don't have project managers where larger ones do, not because they can’t afford them, but because they really don’t need them in a full-time capacity. I think this may be part of why SaaS services have been a huge hit and the term cloud has taken off because of it. SaaS allows companies to tackle a wide variety of problems across the entire business, all the while providing cost effective high tech solutions to solve problems in a way you could never have done before. For the first time in history the quality of business processes is experiencing sustained growth.
“These new cloud-based capabilities are on a growth path and are creating more opportunities for even the smallest of businesses to leverage big data without the traditional expenses of compute farms and massive storage arrays.”
Yes. However, compute farms and storage aren't the main thing that these companies need. They need access to the raw data that contains the data about their business, and the tools to extract the data in which they can take action. Figuring out your company’s problems requires brain power, understanding, data and tools. CPU and disks don't solve complex problems. People do.
It's All About Application Analytics
Ohlhprst also describes big data analytics as being comprised of three primary elements: volumes of unstructured data, processing power and algorithms. However, big data doesn't always imply unstructured data. Log files, or what Loggly calls Big Time Data™ typcially contain a large amount of structure. Dealing with structured data isn't always easy, and if you write software that 'expects' a certain structured format, your analysis can sometimes be broken or flawed if it encounters data that doesn't fit the structure you coded for. One way around this problem is to apply extra meta data to the data set. One technique to solving to this problem is adding a search index to the data, which is the approach Google pioniered and what Splunk and Loggly do for log files or time series event data. By being able to do text search data, and interact with it in realtime, or near real time, the user can optimize on solving the problem
Ohlhorst continues, “For it to be true big data, there has to be lots of it, and most SMBs don’t generate that volume of data internally, which leads them to seek out alternative data sources. Here, the cloud delivers.” Not true. Big data should be defined as an amount of data that a human can not reasonably digest. Generating large amounts of meaningful data is actually a bigger problem. Again, it's understanding the problem you have before you can solve it.
Yup. Ohlhorst explains that throughout 2012, data sets and others can be expected to grow exponentially - “The amount of data being generated globally increases by 40 percent a year", according to the McKinsey Global Institute, a data analytics research firm. True. The access of this data, mostly through the web, generates vast amounts of data as well. Ohlhorst continues that information needs to be organized, sorted and processed- and that takes computing power. Frankly nowadays, CPU is cheap enough that most of these problems can be solved on your laptop. Fast CPUs for crunching 'big data' aren't the problem any more than a search engine's main problem is crawling for data. The real bottleneck is adding meaning to the data that a customer can digest and make actionable.
PaaS/IaaS Accessibility Is a Problem
I’m glad that Ohlhprst recognized that Amazon isn’t the only one in the game in offering private cloud-based big data analytics platforms. He believes that since this technology is designed as a complete platform and not as a service, these platforms are still out of the reach of the SMB market.
Ohlhorst is right that these platforms are out of reach - but not just because they are designed as a complete platform and not as a service. I think it's because SMBs don't know they need it, don't have the data to put on it, and don't have the resources to manage it. There are plenty of hosted solutions out there (see SalesForce and their app marketplace) that provide some serious horsepower to the most important task - managing a company's contacts.
And of course, there had to be a Splunk mention in his article. Splunk sells expensive enterprise software. Their software is often times the most expensive piece of software a company has ever bought. Sounds like Oracle, eh? They aren't converting big data analytics into cloud services; they are simply taking their product and making a slimmed down version into a cloud offering they can generate leads with. Any serious big data customer they land will have to buy that very expensive solution and install it on a bank of computers and then pay people to manage it. SaaS is not what Splunk is taking to the market when they go IPO. It's their hellaciously expensive software licenses.
Big Time Data™. It's in the future of your small business.
No Comments
| Leave a Comment
|
Posted 19 Dec, 2011 by Kord Campbell
in Business and Startup
Sometimes there's just no other way to say "we're down" than just admitting you screwed up and are down. We're in the process of rebuilding the indexes of historic data of our paid customers. This is our largest outage to date, and I'm not at all proud of it.
So What Happened?
Yesterday afternoon all of our machines on Amazon's East region, availability zone 1d, were rebooted by AWS staff for maintenance purposes.
The cause of our failure is what some of you on Twitter are calling "a failure to architect for the cloud". I would refine that a bit to say "a failure to architect for a bunch of guys randomly rebooting 100% of your boxes". We've been told by Amazon they actually had to work hard at rebooting a few of our instances, and one scrappy little box actually survived their reboot wrath.
While some might go on a rant about how 'normal' failures don't affect 100% of your boxes the truth is that any and everything (including an army of reboot monkeys) can be expected to happen to your servers if you wait around long enough. The trick to being good at running a reliable service is to architect around any number of everythings that could happen to your service and build for it.
In this case we didn't build the workaround simply because the system we run - a combination of 0MQ+Solr+Zookeeper+Loggly Special Sauce - makes it extremely challenging to survive a complete failure with more than 1/2 of the cluster missing. With other challenges facing us, we decided to live with the risk.
So, How Do We Make This Right?
Single instances of Loggly's search cluster can't be spread across multiple availability zones or regions due to the amount of data we push around, latencies between the search nodes, and the lack of support in our system for redundant indexes. We've been OK with those limitations in the past simple because we systematically archive data to S3 when it arrives and we are capable of rebuilding indexes on the fly if we lose one or more indexers.
Our primary method to address this will be to start sharding our customers across multiple Loggly deployments. This will prevent further outages to the entire customer base. We've already been investigating other data centers on both dedicated hardware and other cloud-based services.
Finally, we accept full responsibility for the impact to our customers. We will be in touch with our paid customers sometime over the next week to address compensation for this outage.
We welcome feedback below.
Kord Campbell, CEO
3 Comments
| Leave a Comment
|
Posted 16 Nov, 2011 by Mike Blume
in Business, Code, and Startup
One of the fundamental challenges of distributed coding is deciding what/when to integrate. Sure, that patch your colleague just sent you looks good, but is it actually ready to go into master? At Loggly, we've been feeling our way towards a disciplined integration process. A year ago, our frontend developers were all making commits directly to trunk in a single SVN repo. Once every few weeks, we'd run `svn up` on our servers, and hope for the best. Today our code goes through peer review, unit testing, and static analysis before it even touches our master branch.
Like most projects these days, the process starts on github. Fork. Push a feature branch to your repo. Open a pull request. Go through a couple rounds of discussion and revision. Merge. Every change to our code goes through this process. At first we thought it would slow us down, that we'd want pull requests for the nontrivial code and to just push to master for the easy stuff. After just a few days, we found the pull requests were slowing us down not at all, and that we all enjoyed the greater transparency into our colleagues' work.
Once we merge, the automation kicks in -- our default integration branch is 'proposed', so clicking merge doesn't actually get the code into the master branch. Jenkins polls our 'proposed' branch once a minute, then runs a simple preflight script on the code.
Rather than keep that preflight in a jenkins configuration page, we have it checked into the codebase so that any developer can run it too; this way there's no excuse for breaking the build -- you should have seen it break locally =P
Here's our preflight script. Let's go through it line by line.
DIR="$( cd "$( dirname "$0" )" && pwd )"
APP=$DIR/..
First we figure out where we're running, so that we can find the other scripts distributed with the app.
$DIR/purge_pyc
Next we purge pyc files. This is done because if a user recently switched from a branch which contained files which don't exist in our branch, the pyc files may still be around, and may be found by the interpreter.
$DIR/syncenv
Next we run a script to sync our python virtual environment, and ensure all requirements are present.
$DIR/runtests
Here, of course, we run our unit tests. Each run prints a coverage report, so that as we recover from our testing debt, we can measure our progress.
&& $DIR/lint...
Next, and this is important, we run pylint over the parts of our app that we expect to pass with no warnings. As we clean up our app, we continue to add modules to this list. Pylint does a few useful things for us. It looks for trivial name errors of the kind that could quickly cause code to stacktrace -- using a module without importing it, etc. It also enforces certain kinds of coding discipline. Our functions and modules can't exceed a certain length. The cyclomatic complexity of our functions is limited.
If all of this passes successfully, Jenkins automatically pushes the checked-out commit to master, which is where we base our development. Thus, we're always basing our development on known-vetted code.
If any of it fails, Jenkins still has a couple more tricks to pull. Here's our on-failure script:
This comes in two parts. The first runs a standard-issue git-bisect between origin/proposed and origin/master. Since origin/master has already been vetted by jenkins (that's how it became master), we know there'll be a regression somewhere between the commits. This goes into the session output, and is e-mailed to the relevant committers. Next, we roll the proposed branch back to the already-vetted master branch. Whatever pull request broke the build will have to be re-made from scratch.
No Comments
| Leave a Comment
|
Posted 28 Oct, 2011 by David Lanstein
in Business and Code
Well, I'm back, and this time I'm here to talk about an awesome product that we use all the time, PagerDuty. We use it internally for our own alerting (as do a number of Fortune 500 companies along with a million other startups), but we've also integrated it into Alert Birds, which is our alerting tool. With Alert Birds, you can configure saved searches that run against Loggly, and you'll run those searches over a period of time that you've selected, and Alert Birds will escalate alerts in PagerDuty. Before you can do any of those things, however, you need to set up the PagerDuty endpoint.
After you've done that, the next thing you'll need to do is to configure a saved search, and then configure the alert that you want to run. The search itself is pretty straightforward, it has a name, a search string e.g.
(this is why it's cool to send us JSON!), and a list of inputs and devices that you choose - you may want to run a particular search on only your web servers, for instance. The interesting bit is the alert itself, which runs a search that you choose, but has a number of options as to what conditions consitute an alert, and what the message should be:

This is where PagerDuty comes in. Although you can send a GET or POST request to an endpoint of your choosing with the alert data, triggering an alert in PagerDuty is far more useful, as they can SMS/email/phone you, and they also handle escalations and reporting. So, in the example above, if my web servers are spewing 500 exceptions, I want my ops folks to get notified, provided there are more than 10 - I don't want to wake anyone up over a little blip! I'm a nice IT manager like that. Anyhow, once an alert is in a critical state, it will run your search every minute until you're below the threshold, and once that happens, Alert Birds will automatically resolve your alert in PagerDuty.
That's pretty much all there is to it! You can find the docs on Alert Birds here, please do drop me a line at support@loggly.com if you need a hand, and until next time, happy alerting!
No Comments
| Leave a Comment
|
Posted 31 Aug, 2011 by Kord Campbell
in Business and Log Management
I'm pleased as punch to announce the Loggly Add-on for Heroku is now in private beta! The fine folks over at Heroku just emailed me the good news a few hours ago:
"Heroku is very excited to announce the availability of the Loggly add-on to our thousands of developers. Loggly is intuitive, easy-to-use and makes logging fun again by providing a rich set of features enabling users to search and analyze their logs."
It's been over a year since we first visited Heroku's offices to discuss providing a logging add-on for their platform users. The result of both company's efforts over the last year is the first third-party Heroku logging add-on which leverages the power of the our highly scalable logging search engine and the sophistication of Heroku's new Logplex infrastructure.
Simply put, it's awesomesauce in the cloud.
Solving a Big Problem
The challenge Heroku faced with customer logs centered around getting access to all of the logs out of a dyno's stack. Heroku's stack can generate log events from the load balancer, cache, and database server, as well as logs from other add-ons, and more. While Loggly customers have been able to send logs from the app layer for a while now using
Ed Muller's super duper Logglier library, getting the remainder of the events from the stack required a rework of the way those events were routed around on the Heroku platform.
Earlier in the year Heroku
released the first set of features based on this work, dubbing the project
LogPlex and Open Sourced it over on Github. This solution allowed Heroku and Loggly users to use the Heroku authored Logging Add-on to forward logs to a syslog port over on Loggly. This worked well for getting access to events out of the remainder of the Heroku stack, and set the stage for Loggly to build a proper add-on for users to add to their Heroku account.
Without all the hard work from the fine folks at Heroku, we'd never been able to pull off writing our Add-on. You guys rock!
Scaling is Hard
Scaling a sophisticated platform as a service offering like Heroku is a massive challenge. There are brilliant people over at Heroku who have spent insane amounts of time working on scaling their platform to 100s of thousands of applications, all the while adding non-trival features like Logplex to their infrastructure. It's a bit like changing tires on a fighter jet flying at mach 2.
Loggly has been spending time changing tires on jets too. When we launched in February of this year we supported a paltry 2GB a day volumes on accounts. In April we raised that to 4GB a day per account and in June we doubled that to 8GB a day, as well as releasing new pricing plans supporting custom volume and retention times. Today Loggly has over 2,500 customers, and we
just upped our volumes to 12GB a day per account in anticipation of our Heroku Add-on launch. We'll continue to up our volumes over the next few months, and continue to add features which provide custom logging solutions for web applicatoin developers.
Be sure to keep an eye out for more Loggly features like realtime feeds and alerting real soon.
Keep on logging!
No Comments
| Leave a Comment
|
Posted 22 Jun, 2011 by Marie Schultz
in Business and Log Management
Chris from Siloam Springs from Hoover Beaver on Vimeo.
Christopher Hobbs, senior system administrator for The City of Siloam Springs, Arkansas, talks to Kord about how he uses Loggly to debug and troubleshoot the dizzying array of systems he maintains for the city and police and fire departments. You can follow Chris and Siloam Springs on Twitter, and browse his public repos on Github.
No Comments
| Leave a Comment
|
Posted 22 Jun, 2011 by Marie Schultz
in Business and Log Management
Chris from App47 from Hoover Beaver on Vimeo.
Chris Schroeder, CEO of App47, talks to Kord about how App47 is embedding Loggly into their offerings to provide analytics and troubleshooting for mobile developers. Focusing on the enormous challenge of understanding user behavior, improving the experience and troubleshooting application crashes. You can follow App47 and Chris on Twitter.
No Comments
| Leave a Comment
|
Posted 10 May, 2011 by Kord Campbell
in Business and Code
A few week’s ago I wrote up how to implement simple alerting with Loggly and PagerDuty. This week I’m covering how to do something very similar with the new version of Amazon’s CloudWatch which they recently released.
Amazon doesn’t rely on a monitoring agent to collect the metrics for CW, so it’s literally a few clicks in the AWS interface to start using it. Data is collected by their pre-instrumented hypervisor and then forward to the CW service where it can be selected, displayed and alerted on by the user.

With the latest release of CW, Amazon provides new endpoints in the CW API which allow an user to send in custom metrics. These metrics can be used in combination with the hypervisor based metrics to build complex alerts and drive auto-scalability for applications based on EC2.
It’s that new functionality that I’ll be using to send data from Loggly to CloudWatch.
The Code
As always, the code for this post is parked on Loggly’s Github account. The cloudwatch.py file contains the signing bits required for talking to Amazon’s API endpoints, and some basic code for posting to the PutMetricData method. You don’t need the boto library for this, but it won’t hurt if you already have it installed.
The detailed instructions for setting all up are on the Github project page. Basically all you need to do to get this running is to get syslog-ng forwarding your web logs to Loggly, configure your Loggly credentials, and then enter your AWS_ACCESS_KEY_ID and AWS_PRIVATE_ACCESS_KEY_ID in the code.
You’ll need a few cheese shop libraries installed, including httplib2, simplejson and hoover, the Loggly Python library.
Set up a cronjob file that runs it periodically, preferably on an instance you are monitoring.
*/5 * * * * python ~/loggly-watch/main.py
The Result
The code above conducts a simple search on Loggly for all events being sent to the default input for your account. If all you are sending to that input is combined_access formatted log lines, then you’ll end up with hit counts sampled every 5 minutes from Loggly, offset by one minute to ensure we’ve indexed them properly.

The result is pretty impressive, with so little work involved. You can even do combo graphs containing metrics delivered by the AWS hypervisor.
Alarms
Once the metrics are flowing in, you can set alarms to trigger if they go over (or under) a certain threshold. In the screenshot below I’m monitoring for the term ‘exception’ coming in from my crappy blog which is hosted on AppEngine and which logs with my AppEngine async logging library.

The screenshot above shows where CW triggered an alarm for exceptions, then cleared itself after the threshold dropped below 4.
Monitor the Monitor
With Loggly and CloudWatch alerting, there are a whole host of monitoring and correlation use cases you can tackle with just a little bit of hacking. You can even alarm on the cronjob itself to ensure your monitoring is functioning and healthy. Here’s how.
Start by making sure your local syslog instance is sending data to Loggly, and then change your cronjob to pipe it’s output to logger:
*/5 * * * * python /home/kord/code/loggly-watch/main.py 2>&1 | logger -t cloudwatch-cron
Next, set up a search in the same main.py file you are calling with cron to search for a successful run of the cronjob that runs the search (that’s so meta it hurts):
Note: I’m keeping this example purposefully simple. In practice you’ll probably want to make this check little more sophisticated by ensuring the response from the Loggly server is valid or not, and that each search ran successfully.
Finally then create an alarm such that it triggers if the results number less than 1 over a 10 minute period.

Happy alerting!
No Comments
| Leave a Comment
|
Posted 9 May, 2011 by Kord Campbell
in Business, Log Management, and Startup
Aren Sandersen, VP Operations for Bebo, came over today and had lunch with us. Afterwards, we sat down and chatted about how Bebo is changing their infrastructure, manages logs, and how they use Loggly to do debugging, alerting and operational troubleshooting with Loggly. You can view the video on your iPhone via broadband, or watch the mobile version as well.
You can follow Aren and Bebo on Twitter, or sign up for Bebo on their website.
1 Comment
| Leave a Comment
|
Posted 22 Apr, 2011 by Kord Campbell
in Business and Startup
Everyone seems to be blogging about how their service has ben impacted by Amazon’s AWS outage, or whining about how Amazon sucks, or explaining why their service was architected so well that it didn’t impact them, or why you suck if you didn’t plan for this.

As Loggly is based entirely on AWS and was only minimally impacted during the first few hours of the start of the outage, I figured I’d share exactly how we managed to do what we did:
We run across multiple availability zones and don’t rely on EBS for anything other than backups of a few simple databases. Everything else is file based the EC2 instances and their drives are set up in a RAID-1 configuration for speed and slightly more reliablity. Our log streams are backed up to S3 every few minutes as they come into our proxies. We rely on RDS for the database for the user logins, which really ended up being the only thing affected.
I asked Jordan Sissel, Head of Ops and Senior Developer here at Loggly to describe exactly what happened when Nagios/Pagerduty went off a night before last. Here’s what he said:
I saw RDS (prod db) problems in the early morning just as the problems started, but by the time I started debugging it the problem went away. I was notified by pagerduty because beaveroil and some other checks were failing.
Otherwise we weren’t really impacted. We got lucky, I think. I kept my eye on service but it stayed happy during the AWS outages.
Worst case, it’s easy for us (assuming rightscale is functioning, which it wasn’t for some of the day) to migrate to different parts of EC2 due to our use of puppet and are lack of EBS usage (only our RDS uses EBS)
Planning for Failure
Jordan is right, we can pretty much do a Loggly deployment on any AWS region within 20-30 minutes. Because we use Zerigo for DNS, and because we keep short TTLs, we can switch out records and have them updated quick and redirect ALL our inbound and outbound traffic to the new deployment.
Of course that leaves the question about migrating data on our existing or failed indexers. Thankfully that’s not a huge issue for us because we can rebuild them using EMR from our S3 backups.
Before we launched Loggly’s public service, I mandated that Loggly must be able to rebuild failed indexers at will. The work required to support this ended up delaying our public launch by at least 45 days. Now if we lose a box or have to move to another region, we can rebuild any (or all) of our indexers in at most a few hours. In theory we could continue to index new data coming into the system and historical search impact to customers would be minimal.
As my friend Clay Loveless put it so elegantly, “We may rip the rug out from under your feet at any moment.” If you haven’t planned for disaster striking, then you should go back and reassess your infrastructure. Hopefully we’ve planned well for just such a disaster.
<knocks on wood>
No Comments
| Leave a Comment
|
Posted 20 Apr, 2011 by Kord Campbell
in Business and Log Management
Last week Chris Wensel swung by the Loggly office, had lunch, and sat down to do a short video with me to talk about Cascading, what his company Concurrent has been up to, and to discuss how Hadoop is good for processing a crap load of logs. Check out the video below. You can view the video on your iPhone via broadband, or watch the mobile version as well.
You can follow Chris, Cascading and Concurrent on Twitter, download Cascading, and check out his company Concurrent, Inc. Be sure to keep an eye out for the new version of Cascading which is due out soon!
No Comments
| Leave a Comment
|
Posted 16 Dec, 2010 by raffy@loggly.com
in Business and Log Management
I was invited as a guest to the CloudChaser podcast with Matt Grant.
We talked about a number of interesting topics related to logging, cloud, and security.
Log Management Challenges
We discussed a number of log management challenges from log generation to security in the cloud. Here is a brief list of topics we talked about:
- We first touched upon some issues with log file generation. I am discussing the lack of logging guidelines and the problems that brings with it.
- How are logs analyzed? One of the problems it that it should really be the application owners that look at their logs. From a security point of view, security analysts should look at the overall picture. But they should not be the only ones looking at those logs. It’s impossible for them to understand all the logs on n intimate level.
- Yet another problem is understanding the logs. Visualization is an interesting way of addressing that issue. Especially for reporting and exploration or discovery.
- Large-scale log storage seems to be a problem. Is it? Make sure you setup use-case driven retention policies!
We touched upon a number of other topics. Here is a short list:
- It seems that users are moving more and more into the application layer to collect logs. It’s not just the infrastructure layer anymore!
- Availability, performance, etc. can be a great way of selling your log management budget instead of using security as a selling point.
- Obviously we talked about Logging as a Service and Loggly in specific. A lot of logs are in the cloud or are being moved into the cloud ;)
- Security and regulatory concerns for logging in the cloud are always a fun topic. We discuss this briefly. The upshot is that it often isn’t a show stopper!
But hey, listen yourself!
2 Comments
| Leave a Comment
|
Posted 23 Nov, 2010 by Kord Campbell
in Business, Log Management, and Startup
My old friend David Myton from Boxed Ice swung by the Loggly office the other day to say howdy. I sat down with him and did a quick Geek CEO video about bootstrapping, developing product, filling the sales pipe, and listening to him being wise-beyond-his-years about raising capital.
Server Density is now growing at 20% a month, enjoyes a super low churn, and just got to break-even over the weekend. While others are just dreaming about startups, or grinding away for the man, David is here living the dream.
David Myton, CEO of Boxed Ice from Hoover on Vimeo.
David shared with me that the company will be moving to the Bay Area in a few more months, once they grow a little more and get their work stuff sorted around. They’re more than welcome to squat with us when they do!
No Comments
| Leave a Comment
|
Posted 13 Oct, 2010 by Kord Campbell
in Business, Code, and Log Management
Edited on October 14th, for 2 orders of magnitude bad math.
Big data is big news. Big data is a big problem, and big solutions for it can drive big revenues. Because big money is involved, more and more people are writing and focusing on how big of pack-rats we’ve become. There’s only one fact everyone seems to be missing: Big is relative, after all.
Big Data in the Past

Back in the 70s when I was a kid, my family’s oil business had one of these old clunky Burroughs which my mom not-so-fondly called Maribel. Whenever you wanted to invoice someone, you would load Maribel up with the customer’s account history from paper tape and then manually enter the new invoices. When the existing tape got full, you started a new one. The tapes were yellow, about an inch across and maybe 20 feet long.
We stored these tapes in envelopes, and the envelopes were in turn stored in vertical file cabinets. The hall outside my mom’s office was lined with these files cabinets and the cabients were literarily overflowing into the kitchen because there was no more room in the hall for them. If you estimated 5 bits per line, 72 lines per foot, and 20 feet of tape, that would give you roughly 1KB of storage on a single tape. Multiply that by 1000’s of these tapes and I figure we had a total of 1-2MB of data stored in about 100-200sq/ft of space.
Lots of customers, lots of tape, lots of work, and lots and lots of data. At least lots for 1976.
Your Future Arrived Yesterday
In 1996 my future had arrived. I was running a moderate sized ISP, and found myself buying a full-height 5 1/2" 8GB drive from Seagate for my news server. It cost me just over $2,000. With that one drive alone, I could have stored nearly 300 football field’s worth Maribel’s yellow tape based data.
Just last weekend at Lucene Revolution I gave some company my email address in exchange for a 8GB USB drive. I promptly tore it apart and extracted from it’s guts a sliver of a micro SD card. I could easily fit a few thousand of those cards in the space of that old clunky Seagate drive.
Earlier this year an article in Wired quoted IDC as saying, the size of the information universe in 2009 was 800 Exabytes. IDC went on to say 2020’s information universe was expected to be a staggering 35 Zettabytes; nearly 44 times as much data as there is in existence today.
For reference, one Zettabyte = one thousand Exabytes, one Exabyte = one thousand Petabytes, one Petabyte = one thousand Terrabytes, and one Terrabyte = one thousand Gigabytes. That means a Zettabyte = a million million Gigabytes!
That’s around 3 × 10^16 times as much data as we had in our office in 1976! If we decided to store it in file cabinets filled with yellow tape, our dystopian future’s 35ZB of data would take up the surface area of 546 earths. Say what?
It reminds me of something you’d see in a Douglas Adams novel, where a thousands of small, slightly cranky robots named Maribel are forced to shovel and store yellow tape rolls until they collapse into a pile of rust several millions years later.
Smell the Data Exhaust
Data exhaust can be defined as the machine events generated when a user accesses data stored on a system connected to the Internet, such as when a user access their photos on Flickr. Hadoop Karma indicates Flickr was storing 4 billion photos by the end of 2009. In aggregate, those photos are stored on thousands of servers and are being viewed by millions of users across the globe everyday.
In a simple senario where all the photos on Flickr were viewed once each by a single user, the logs would weigh in at just over 2TB! In reality, Flickr’s log volume probably exceeds a Petabyte or more a year for just the views of the lightbox pages alone. Facebook’s numbers are even scarier. In one month they’ll store 2.5 billion photos on their system. In turn, all the people viewing those photos will generate an order of magnitude more log data than Flickr even has in all the photos they’ve ever stored.
Even though we’re in private beta at the moment, we’re already seeing combined log volumes of around 3GB a day from 15 customers. A few of our customers, including About.me and Server Density are sending us near the max of what we allow on the private beta right now. We expect those volumes to go up considerably when we launch the public beta in December, where an average customer could be sending us anywhere from 1 to 5GB a day each. It won’t take long to start referring to our data in units of Petabytes stored.

While demand for storing all those logs is accelerating along with all the data being generated, the technology behind the storage and processing of data also continues to accelerate. Within a few months time, the technology we are developing at Loggly will provide companies a way to peek into these large volumes of log data – where they couldn’t before – and allow them to see exactly what their users are doing with all that big data.
Loggly’s features for search, reporting and map reducing will make dealing with these huge volumes as trivial as stuffing a yellow punch tape into an envelope, except we don’t need a robot named Maribel to do it.
And so the Universe ended.
4 Comments
| Leave a Comment
|
Posted 8 Oct, 2010 by raffy@loggly.com
in Business and Log Management
The other day, Andreas from Nemertes posted a blog on The missing piece of cloud security?. In his blog post, Andreas talks about how there is no real solution for handling logs in the cloud. Due to the fact that Loggly has been in private beta, I can’t really say that Andreas is wrong.
Instead (or in addition to) reading the rest of this blog, listen to the podcast that we recorded last Wednesday as a follow up to Andreas’ blog entry.
Loggly is the first logging as a service (LaaS) platform.

In his blog post, Andreas mentions a number of challenges associated with customers doing their own log management in the cloud: (I took the freedom to expand the list a bit)
- Ephemeral virtual machines ask for log centralization.
- Centralization of logs creates a single point of failure.
- Installation and maintenance of a logging server takes time and costs resources (and money).
- Knowledgeable (and expensive) personnel is required to configure logging solutions and maintain them.
- Static solutions (installing your own log management tool) do not fit into the cloud model of “pay as you go”.
- Building a scalable and reliable logging solution in the cloud is hard and expensive.
These reasons and a number of others are the foundation of Loggly. We are eliminating these problems for our customers.
Let me continue along Andrea’s blog post. He moves on to talk about the benefits and use-cases for a logging as a service platform:
- security information management
- regulatory compliance, incident response and post-incident forensics
- control, visibility and resilience, while preserving “chain of custody” for audit purposes
In my view, these are fantastic use-cases for a logging as a service platform, but there are so many other uses, especially in the world of Web applications. There is a big big ecosystem around application logging that benefits greatly from a logging as a service platform. And to support that ecosystem, we will keep innovating and adding new features to our platform. A step into that direction are our new HTTP inputs that allow you to send HTTP posts containing log messages.
Want to know more? Sign up for the Loggly Beta ?
6 Comments
| Leave a Comment
|
Posted 5 Oct, 2010 by Kord Campbell
in Business and Startup
The extended private beta version of Loggly was pushed to production last Friday, October 1st, 2010. This release was planned in early August for release by end-of-September, so we did pretty well on getting it out when we did.
We called this release ‘the extended private beta’ because, well, we are extending the private beta to include more people who signed up on the beta registration page. We were so busy writing code for the release we couldn’t think of anything more exciting than this. Shoot, we were so busy I actually forgot to blog about the release!
If you haven’t gotten your invite yet, please be patient. We’re launching servers as fast as we can! Here’s a screen shot of the thing to tide you over:

Although we’re still in private beta, our roadmap for getting public beta is only a few months out. We’re planning on rolling out paid services for the private beta users toward the middle of November, and we should open up access to all registrations by the middle of December.
Min Product – Max Volume
In these days of rapid minimal viable product launches, we’ve been comparatively slow in launching Loggly’s service. Unlike other MVP offerings, Loggly will be expected able to handle customers that send in anywhere from a few MB to multiple GB of data per day of log files. Before we launch we have to test the systems will scale, and won’t melt under load. The extended private beta is part of that testing.
While we think we’ve got the scale issue licked for the moment, we’ve decided to add a feature which rate limits early beta accounts to 200MB/day of data. So we don’t block the upstream syslog servers, we’ll continue to accept and count event data, but will discard any data which exceeds the account limits. Limits are reset at midnight GMT, but if you need a higher daily limit please let us know.
Over the next few months we will raise these limits for the private beta users, and when we launch public beta sometime in December we’ll have a freemium account which will do somewhere in the neighborhood of 250-500MB/day. Paid tiers will be created to handle volumes above these rates. Paid tiers will also have additional features available to them such as S3 storage access, Hadoop processing, etc.
This release was the first of many steps toward making Loggly kick some serious ass when it comes to storing and searching the log files coming out of your infrastructure.
If you are interested in helping us beta test the product, please fill out the signup form and then give us a shout out on Twitter. We’ll see what we can do to get you on ASAP!
No Comments
| Leave a Comment
|
Posted 9 Aug, 2010 by jon@loggly.com
in Business and Log Management

I was one of three speakers at the Lucene/Solr meetup last month, co-sponsored by salesforce and Lucid Imagination. I don’t know how anyone at salesforce with a window gets any work done, considering the view – take a look at Grant’s photo to see what I mean. Thanks to Bill from salesforce for hosting, and the guys at Lucid for organizing things. You can check out the two other talks here, as well as talks from previous meetups.
UPDATE: I’ll be doing a slightly expanded version of this talk at Lucene Revolution in Boston on October 8th, incorporating some of the stuff I talk about below.
I got a few interesting questions and comments after the talk, so I thought I’d expand a bit on what was in my slides, which were perhaps a little dense.
“Log Search is highly skewed”
In the talk, I said that the most important search data is the most recent. When you have a problem, you’re far more likely to care about what happened in the last few minutes or hours or days than what happened a month ago. Thats not say that you’ll never need to search older data, just that most of the time, you won’t.
After the talk, though, it became obvious that I should also have said that our users are likely to use search in a way that is also pretty skewed when compared to “normal” search products. Basically, we expect that most people will use the system somewhat sporadically, but that when they do, its likely to be a pretty intensive session of bug hunting. So instead of a fairly continuous search load, we get random spikes for a small subset of all the data we have in Solr. This is actually good for us, because we don’t need to keep all of the shards for all of our customers “hot” in Solr. When a customer shows up, we can warm their data quickly, and let Solr and the filesystem cache do their thing to deal with shards that haven’t been used for a while.
The most important point here is that the overall system is going to be spending the vast majority of its resources on indexing, rather than searching. I can’t give you numbers, but if we end up spending anything more than about 5-10% of our cycles on search, I’ll be very surprised. This is not your typical consumer search product.
0MQ
I talked a bit about 0MQ, and said that we chose it primarily because its fast and lightweight, even though its possible that we could lose data if things break. I clarified this a bit in a comment on Sarah Allen’s blog because I want to make sure the message is that 0MQ is awesome, not that it loses data. Here’s the guts of what I said…
I wanted to clarify one point in your writeup, though, to make sure people don’t get the wrong idea about 0MQ. Yes, our implementation of 0MQ has a potential “leak”, where we can lose messages, but its a very uncommon case, and the impact is small. Specifically, if one of the solr nodes dies hard, we potentially lose any events that were sent to it in the last batch (0MQ batches to minimize comms overhead). In steady state, 0MQ is rock solid, 100% reliable, and faaaaaast.
Pieter (at iMatix) and I are currently discussing ways to solve the hard death problem, and I don’t anticipate it being a problem very long. As I said in the talk, 0MQ is unbelievably cool – if you haven’t got a project that needs it, make one up!
We sponsored some work to get the SWAP functionality in version 2 of 0MQ, and I’ve been blown away by the guys at iMatix – they really want 0MQ to work, and work well. My throw-away comment prompted an email from Pieter asking for more details, and, as I said to Sarah above, we’re already looking at how to fix it.
Oh, and in case you’re wondering how fast a one-armed paper-hanger is, take a look at what The Word Detective says about it (scroll down till you see the “You missed a spot” section). Maybe I should have used “flat out like a lizard drinking” instead?
Sharding
The way we create shards by indexing, then merging, then merging again and again and again raised a few questions that are worth repeating…
To recap, we build small (5 minute) shards on our hot indexers. When we stop adding events to them, they get merged with older shards until we hit another size limit (30 minutes). They then get merged with even older shards, until we hit the next time limit (4 hours). And so on up the chain until they cap out at a week long. Along the way, we push indexes from box to box, to balance the load on the system as a whole.
The first question is fairly obvious: Why?
At first glance, it seems like we’re just creating work for ourselves. Surely we could just build the shards and use them as is, right? The problem is that we would have a lot of 5 minute shards floating around the system, and we already know that Solr starts getting cranky when you run a lot of cores in a single instance. So, why don’t we just build bigger shards? The issue there is that with the version of Solr we’re using, we have to reopen the index to make new data available, and we currently do that every 10 seconds (hence the “NRT + SolrCloud = Our Nirvana” in my slides). Since we have to do this, we’d end up with too many segments in the hot index, or (if we’re not careful with our merge factor) a lot of automatic merging that means that the hot index becomes unavailable for updates for too long for my liking. So, we got pushed into this approach by something that I’m hoping will soon be a thing of the past. I’m really looking forward to Michael Busch’s talk at Lucene Revolution which promises to remove the “N” from NRT. I’m not sure what is better than nirvana, but I’m hoping to find out soon
We may have been forced into doing things this way, but there is a lot of value in the model we have. In some ways, we’re taking over a part of Lucene (merging) that has been absolutely invaluable, but can sometimes be a little difficult to control. We now have complete control over when and where indexes get merged. I probably should point out that we deliberately don’t do any merging on the 5 minute shards, and that we’re careful with the merge parameters on the larger shards to make the merges that do happen as efficient as possible. The model also gives us a very simple index naming scheme based on time, which means we always know exactly where to find data for a time-constrained query. More on this in a bit…
The next question (from the meetup) was what is the overhead of all this merging?
Rather than give numbers, its worth thinking about whether we’re actually doing anything more than Lucene already does when you start building big indexes. I think the answer to that is that we’re actually just exposing and taking over the automatic behaviour, rather than doing something “extra”. So I think the real overhead is close to zero. Compared to building a bunch of shards in parallel using Hadoop, we’re certainly doing more work, but most of the Hadoop based systems I’ve looked at are geared more towards building indexes from a large existing corpus, rather than dealing with a real time stream.
My final comment on this is that since its all completely configurable, we’re not locked into any of the times I’ve mentioned above. Maybe when we move to NRT, or RT, we can bump the hot shard size up to hours or days, assuming that we’re still in control of merging. We shall see…
Constructive Laziness
Circling back to the first section, where I talked about how skewed we expect our search to be, the time-based shards gives us a very clean way to limit the impact of our search requests. Since we can constrain a search to a specific time period, its easy for us to identify which indexes we need to hit to satisfy the search. Our ideal search is for something in the last few minutes, which can be entirely served out of one or two of the five minute shards. We may have gigabytes or (hopefully) terabytes of index data for the same customer sitting around on our system, but if we can satisfy their request by hitting two small, heavily cached cores, then we’re in great shape. I wonder if life will be so kind to us?
Random aside: Synchronicity
Every now and then, things just come together in strange ways. A couple of weeks ago, Kord and I talked with Diego and Santiago from Flaptor, who are working on IndexTank. Diego and I were at LookSmart together years and years and years ago, but thats not the synchronicity. As we were talking, Diego said they were working on a “Nebulizer” which does automatic distribution of their index in the cloud. The day before the meeting, I’d pulled all of the code that deals with this in our system into a class named “TheDecider” (I’m still wrestling with a way to make misunderestimate() a useful method in this class). That evening I went to a NoSQL meetup, and met someone who is also working on the equivalent for their system. Maybe there is something in the air?
4 Comments
| Leave a Comment
|
Posted 27 May, 2010 by raffy@loggly.com
in Business and Log Management
Last week I started playing some more with the Logging APIs from Loggly. For the first time I started embedding AJAX calls to the API into a Web application running on an external domain. Well, guess what happened? The browser barked at me telling me that I couldn’t execute a cross-domain AJAX call. I guess from a security perspective, that makes a lot of sense. However, I started thinking about how I could overcome this problem. The one way that I could have done it was not to use AJAX, but write some code server-side that would fetch the information format the Loggly API and then present it back to my Web application. I could even expose the information as an end point on the same domain that I then query from my application (see Figure).

Well, this seemed wrong. Why did we just design a really nice, RESTful API and then developers who want to use it have to build a server-side wrapper first. This didn’t make sense to me. So I kept digging. Fortunately, I found the solution. It’s called JSONP (JASON with Padding). Here is how it works and how you can leverage it in your own applications.
Let’s assume I am building an application at labs.loggly.com that will access the API located at loggly.loggly.com. With jQuery, my AJAX call looks as follows:
$.ajax({url: "http://loggly.loggly.com/api/search/?q=ntp", username="guest", password="loggly", ...})
Now, if you do this, you will get the cross-domain error. However, if you just slightly change your call to include an extra parameter, it will succeed:
$.ajax({url: "http://loggly.loggly.com/api/search/?q=ntp",
username='guest', password='loggly',
dataType:'jsonp',
success: function(data) {
flare = data['data'];
},
error: function(XMLHttpRequest, textStatus, errorThrown) {
alert(textStatus+" - "+errorThrown);
}
})
Note the newly added dataType parameter. That’s it? Yes, that’s it. It will work like a charm. No more cross-domain security issues. What basically happens are two things. First, the AJAX request that is executed has one more extra query parameter: &callback=?, where the question mark is some string that jQuery randomly generates. The second thing that happens is on the Loggly side. If the callback parameter is present, Loggly does not return the plain JSON element that you would expect, but it wraps it in a function call. Something like:
jsonp12312312({data:{"May-20-2010 12:13:45": 2"}, numFound: 1})
The next thing that happens is that when your browser gets the answer back like this, it will try to execute the function called jsonp12312312. jQuery internally handled that for you by creating a function hook for that function that points to the success function provided to the AJAX call.
That’s really it. We are looking forward seeing your applications that are using the Loggly APIs!
By the way, Loggly is using Django Piston for handling the APIs. The library automatically handles JSONP responses when a parameter called “callback” is present!
No Comments
| Leave a Comment
|
Posted 18 May, 2010 by Kord Campbell
in Business and Startup
On Monday, Loggly closed a $4.2M B round, with Trinity Ventures leading and True Ventures participating. As you may recall from my previous post, True led our initial seed investment, which was closed 5 months ago to the day.
My relationship with Trinity goes back over a year and a half ago – well before Raffy and I started thinking about doing a cloud based log management offering. Like many other startups, Trinity was started by two entrepreneurs. For years, Trinity’s motto has been focusing on early stage companies in specific technology categories, such as cloud computing and systems management.
Loggly is extremely fortunate to be working with Trinity and their bright team, and we greatly value the market experience they bring to the relationship.
History Lessons
I met Trinity through a fairly short introduction path. My good friend and former colleague, Dakota Sullivan, introduced me to a gentleman named Matt Strand in January of 2009. Matt and I had coffee at Crossroads Cafe in South Beach where I told him I was looking to join or start a cloud computing based company. Matt figured he should hook me up with a VC buddy of his, Dan Scholnick at Trinity.
Here’s the email introducing the two of us:
Dan <> Kord
Dan, please meet Kord Campbell. He is a serial entrepreneur interested in cloud computing, systems management, etc. with a few interesting ideas brewing. He is the tallest person I’ve met in at least a year or two.
Kord, please meet Dan Scholnick. He was one of the first employees at Wily and is now focusing on investments in your area of expertise for Trinity Ventures.
I think it’d be valuable for you two to connect. Let me know if there’s anything further I can provide, otherwise I’ll step back here and let you guys connect directly.
Best,
Matt
Matt was right about it being a valuable connection. Over the next year Dan and I would spend time together drinking coffee, chatting on the phone, and emailing each other about ideas in and around the enterprise and cloud computing space.
It was because of my conversations with Dan that Raffy and I were able to come to the idea of a cloud based logging service. Even when it came time to start pitching Loggly to others, Dan and Noel assisted us in honing our pitch, which eventually led to us being funded by True in a seed round.
Start Small, Go Big
When you are starting out, even the smallest conversation or the shortest email could potentially be the most important one you’ve had in years. Having an idea, growing it, and turning it into a business is a complicated process. That process takes time, and doesn’t happen over a matter of days, or even weeks, but instead over months and even years.
Our relationship with Trinity has been a long time in the works. While it may have appeared to happen rather quickly, Loggly’s efforts with Trinity started at the very beginning of its life.
In as much as your idea should evolve over time, your ability to convey the idea and the opportunity it represents should grow as well. I’ve lost count of how many times Dan has told me to ‘crisp up’ my presentation, discussed with me partnership negotiation strategies, or told me how to approach feedback with our current private beta testers, but I’m sure the hell glad he did.
Without investors like Trinity and True, it’s unlikely I’d be here telling you this story. You would be well to seek out these types of investors when you are looking for direction and guidance for your idea.
Now you’ll excuse me while we get back to coding. We have beta testers who have logs in need of indexing!
7 Comments
| Leave a Comment
|
Posted 15 Apr, 2010 by Kord Campbell
in Business and Log Management
Dave Rosenberg posted an opinion about cloud based logging yesterday on his Software, Interrupted blog. Dave starts out by mentioning Gartner predicted IT would spend more money on private cloud than public cloud through 2012. Here’s the exact quote from Gartner:
“Despite the economies of scale offered by public cloud providers, private cloud services will prevail for the foreseeable future while public cloud offerings mature, according to Gartner, Inc. Through 2012, IT organizations will spend more money on private cloud computing investments than on offerings from public cloud providers.”
This statement is a bit like NASA doing a press release announcing the moon is continuing to orbit the earth. Wow! The moon, still here next year? That’s awesome. Of course IT is going to spend more money on virutalization for the next few years. The success of the private cloud can be attributed to the fact virtualization has been around for a good while now, and is finally being pressed into mainstream use behind the firewall. Shoot, I think I was running Wine on some of my Linux boxes back in the mid-90s, which means virtualization has been commercialized for at least 15 years at the least. The idea of virtualizing an OS goes back well into the 60s. Come to think of it, so do I.
The public cloud, specifically IaaS and SaaS, is a grouping of emerging technologies. We’re just now starting to figure out how to wield it correctly for new business models. Poking holes in it at this point is simply rabble rousing by companies who’s business models are threatened by it and people who don’t understand it or have a use for it.
It’s a Complicance
Guy Churchward tries to make some good points in his talk with Dave, but at the end of the day, LogLogic is mainly an appliance vendor, and not only do they have big-time COGS to worry about, they also have to figure out how exactly a cloud customer is going to deploy their box on Amazon’s EC2 service. (Hint: They aren’t.) While you might be able to send logs back out of the cloud to an appliance behind the firewall, it’s unlikely to make economical sense to do so in the long term.

While there is a valid point in calling out cloud concerns, security itself is ALWAYS a concern, regardless of whether you run in the cloud or in your own datacenter. Frankly, with Loggly I’m likely better at storing and securing your logs than you are by yourself in your own data center, mostly due to the fact I’m under pressure by multiple people like you to provide a service which is expected at the outset to be secure. It’s no different than the pressure that Google has on them for securing your email, SalesForce for securing your leads, or Amazon securing your credit card info. We’re all culpable here for the security of your data.
Additionally, not all that cloudy data is created equal. A lot of the companies running in the cloud today are web based app companies, and the data they generate is often times very public in nature and not at all affected by compliance concerns. Do you think some user on Flickr cares if I stole all their comments? What about getting access to all those juicy tweets of mine? Oh wait, those are already in the Library of Congress. Nevermind, false alarm!
When IT Rains IT Pours
Log file data is already one of the largest sets of data on the planet. Logging alone in the public cloud is going to be absolutely staggering over the next few years. These trends are being driven by people switching to SaaS based applications, in turn who’s infrastructure either requires the elastic capabilities only the public cloud can provide, or who’s price point can’t be matched by private cloud offerings.
The elastic nature of these infrastructures means the logs which they generate need to be collected and stored in centralized location before the box that generated them disappears. There are many types of logs which are valuable to a company for understanding their business, and not so valuable for those data-thieving ruffians everyone keeps talking about.
While the security access data or net-flow information from public cloud vendors might alleviate the concerns of some consumers, I think there are much higher value adds to these offerings by being able to power availability and analytics services around a company’s application via a log file storage platform.
While the private cloud may continue to orbit peacefully for the next few years, the use of it for web based services will decay eventually, and it’ll be regulated to the more mundane stuff like storing my dental records and tracking my orders over on RadiatorBarn.com.
BTW, I’m still waiting on my radiator, Burton.
1 Comment
| Leave a Comment
|
Posted 1 Mar, 2010 by raffy@loggly.com
in Business

I am attending the RSA conference this week. The first session I attended was the Cloud Security Alliance (CSA) meeting. Reading some of the accompanying material and listening to some of the presentations and panels, I couldn’t help it but notice that the terms auditing and logging were all over.
Here is my attempt for an explanation of this. It seems that one of the reasons for this is the nature of the cloud. Think about it. You are in an environment where you don’t control much. You are in an environment where you cannot trust most of the infrastructure pieces. For example, if you are using AWS like we are doing at Loggly, you should generally not trust your AMIs (the OS images). Now, what do you do if you don’t trust someone? You observe them, you monitor them. That’s exactly what is and needs to happen in the cloud: You don’t trust the service. To mitigate this issue, you are going to monitor the service.
And to make this not just my explanation, here is what some panelists during the CSA meeting said:
“Loss of visibility in the cloud” – Scott Chasin, CTO McAfee SaaS Unit
“Lose control and still maintain accountability” – Ken Biery, Verizon Business.
Is the cloud the killer app for logging? And if that’s the case, how do you manage your logs in the cloud? There are hardly any cloud logging solutions out there. I think you see where I am going with this.
No Comments
| Leave a Comment
|
Posted 17 Dec, 2009 by Kord Campbell
in Business and Startup
Loggly just closed an A round with True Ventures on Wednesday. From start to finish, Raffy, Jon and I talked to over 20 capital firms, with fund sizes ranging from a few hundred thousand dollars to over a billion dollars invested. In all, we spent exactly 90 days on our capital raising efforts, starting with essentially nothing, and then authoring and tweaking the executive summary, financial model, and investor presentation as we went. Oh, and we wrote a crapload of code in there too. The Loggly Beta deadline waits on no man.

Perhaps it was fate that we spoke with Puneet Agarwal at True Ventures first. True has a massive amount of experience investing in and managing early stage companies. Their record of past successes speaks for itself, and their team has experience with over 100 early stage investments that have generated significant investment returns. Frankly, Raffy, Jon and I are extremely fortunate to be working with the True Ventures team.
That first meeting with Puneet was actually quite easy; we had no other expectations other than sharing what we were thinking with someone who knew the space well. That was the calm before the storm though – over the coming weeks we struggled with writing our investor deck, meeting schedules, market size expectations, investor lack of familiarity with our market, and became consumed with correctly casting the “going big” portion of our pitch.
Going Big
The best way to describe what “going big” means is to just be blunt about it. It means, “How are you going to make your idea – your startup – compete effectively in a multi-billion dollar market?” You know, how are you going to get big like Apple, or SalesForce, for example. “Say what?”
A bunch of you capital guys and seasoned entrepreneurs will nod your heads vigorously at this statement. “Yes, yes. You need to show how this gets really big!” And, for all practical purposes, you guys are absolutely right. For a largish VC, say with a fund of a billion or so dollars to invest, they HAVE to go with early startup guys who are going to go really, REALLY big. It’s not a matter of money, it’s actually a matter of time. If you have a BILLION dollars, and you invest a million in each company you fund, that’s a THOUSAND companies you need to talk to, investigate, vet, poke at, wrangle with, grow to love, etc. Yeah, no. That’s not going to work unless you focus on a given market.
You’d have to filter fast. Kick out the guys that MIGHT do a run rate of low 10s of millions a year (crazy, right?) because they aren’t big enough. Shoot for the guys who tell a good story about how they are going to turn into Twitter, or Facebook, and exit for billions. Find those guys! This results in an investor funding a bare handful of early stage startup each year, even if they say otherwise on their website.
It also serves another purpose, all those meetings with those small startups. It allows an investor to form early relationships with companies who are successful at getting through the valley of death. If an investor finds out a startup they talked to early on is doing well, has revenue coming in, is growing, and expanding to the “going big” event, then maybe they might need some more capital. Maybe it’s time to invest.
Entrepreneur Up
If you are doing an early stage startup and are going to raise capital, you need to toughen up a bit before you go out. Remember, it only takes one firm to believe in your idea, but you are going to get an inverse number of rejections before that event transpires. If you get a bad review from someone, take their advice in stride and figure out how it applies to you. Tweak as needed, and move on to the next firm to vet what you’ve discovered.
Above all, be honest with yourself and your assumptions and don’t give the investor who gave you a bad review a hard time. If you think you’ve been asked to prove something unreasonable, like how you are going to become a billion dollar company when it’s just the 2 of you and a 1,000 lines of code, then say as much to the investor. Don’t be afraid to say you don’t know, or restate what you do. Don’t be afraid to talk through how you get big with the investors.
After being asked how we got to a billion dollar valuation by one VC, I turned it around on him and asked how he knew he was going really big on his company (which exited for >$1B). His answer? “We didn’t know until we got there!”
Loggly is going to go big, of that I assure you. But first, we have a product to build, customer and partner relationships to forge, and problems to solve for storing a ridiculous amount of log files. Once we have these tasks behind us, we’ll have a great handle on how we’re going to go really, really big.
9 Comments
| Leave a Comment
|
Posted 19 Nov, 2009 by Kord Campbell
in Business and Startup
We’ve been on the investor tour for the last month and a half at Loggly. We spent about a month working on the investor presentation, executive summary and revenue model to prepare. Based on the better feedback we’ve received, we’ve continued to refine the pitch and plan.
We’ve heard everything from “you need to show me how this is a billion dollar company” (lol wut?) to “move the team slide just below the problem statement”. Bad advice aside, I remain focused on our market size and, in particular, the user conversion assumptions that go along with it.
I’ll be transparent here. I expect Loggly’s model to yield a 10-20% conversion rate to paid customer from a signed up freemium account. While I have good justifications for those numbers, posts like this, by Derek Haynes of ScoutApp, would appear to contradict my assumptions, especially with comments like “The 1% rule is the Pi of freemium web apps!”. In all fairness he has an excellent point about focusing on retention optimization, but should we so blindly ignore our conversion pipe just because of the 1% rule says we’ll get the users?
Theories != Rules
As it turns out, the “1% rule” is based on observations on other websites and expected conversion from a site visitor to a paid user. Coincidentally enough, this assumption echos the controversial content creation theory and smacks of Chinese Math.
The main problem with this approach is it completely and utterly ignores the remainder of a given site’s user pipeline. Pipelines, or funnels, are the bread and butter of any sales driven organization. While sites like YouTube may have simple funnels, sites like Loggly are a bit more complicated, and a missed assumption somewhere along the way can turn your assumed 1% conversion into a .1% conversion fairly quickly. To illustrate, here’s a short list of what we’ll be tracking at Loggly (per month):
- number of ad views on ad network
- number of visits
- uniques
- % visits and uniques from ads
- bounce rate
- content consumed (price sheets/videos)
- participation in the demo (low friction getting started)
- freemium signups
- use of a freemium account
- use of certain features in the freemium account
- conversion rate to paid accounts
- use of paid account
- logging rates
- upgrades from one paid tier to another
It’s within this pipeline that we focus our attentions to help maximize conversions from visit to paid user. Not surprisingly, the devil is in the details.
Care and Feeding of Your Pipeline
Portions of a pipeline need to have estimated percentages assigned to them, and then you need to actively monitor those conversions from step to step when the site goes live. I’ll start with the expectation of 1 conversion from 100 unique visits to a paid account and then project assumptions about the other portions of the pipeline:
- 10,000 ad views yields 100 unique visits (1%)
- 100 unique visits yields 20 content consumers (20%)
- 20 content consumers yields 10 demo participators (50%)
- 10 demo participators yields 5 freemium signups (50%)
- 5 freemium signups yield 1 paid account (20%)
Notice this gives us our 1% conversion from unique visit to paid user, but still shows a 20% conversion from freemium to paid account. Now we do a breakdown for the intermediate steps, and find places we can optimize. Your mileage may vary, so make sure you understand your steps, and measure the effects of changes on your conversion numbers when you are able to do so!
Content is King
Based on prior experience, we know the subject material of Loggly should yield decent conversions from a visitor to content consumer, where the parties are interested in the offering. At previous engagements we were able to get 10K unique video views a month off 100K visits per month. Add in another 10% viewing pricing, examples, etc., and 20% starts to seem reasonable. By driving viewing content, we increase stickiness, educate the users about the service, and increase conversions to freemium signup.
Once the user is viewing content, we want to convert them to using the demo, and then on to creating an account later. Loggly’s demo will actually be a live demonstration of the service using the user’s own content, but it won’t require the user to sign up in the traditional sense. By flipping the signup process on it’s head, we get the user using the features first, then ask them for account information after they’ve decided they like the service. All in, and based on the frictionless signup, we expect 25% of users who viewed advanced content to convert to a freemium account.
Get Out of the Way!
Remember, your signup form is a HUGE barrier to users. If you require them to provide 12 fields of data to fill out, it’s that much less likely you’ll get a signup out of the deal. Just look at this signup form – I gave up after 5 minutes of trying to fill it out properly (picky password, clearing fields on submission, etc.), even though I’m highly interested in IBM’s offer. Most people won’t spend more than a minute or so filling out your signup form. The lesson here is to get out of the way so a user can start using your service quickly.
Finding Cash in the Pipe
Remember, freemium accounts are not the same as free trials. A free trial expires, but may contain all the features of a paid account. A freemium account doesn’t expire, and usually contains a subset of features of the paid account. By keeping the freemium account active for a longer period of time than the trial (and by keeping the cost of providing services to it low) you extend the amount of time a freemium user can convert to a paid user. While this may not show up in the first few months, over time it will have a compound effect on your conversions as users desire more features from the service.
At this point in the pipeline we are at a 5% conversion rate from visitor to freemium account, which so far seems to be a reasonable assumption. If we take the 10-20% estimated conversion rate to paid assumptions, then we arrive at a .5% to 1% conversion rate from visitor to paid account, which is within striking distance of our 1% goal.
In the next segment of this post, I’ll discuss analyzing how users use their accounts, and steps you can take to maximize conversion from freemium to paid.
Edit (11/12/09): Fixed my bullets in the second list.
4 Comments
| Leave a Comment
|
Posted 5 Oct, 2009 by Kord Campbell
in Business and Startup
I’m delighted to announce Jon Gifford as the new addition to Loggly’s founding partners. Jon will be taking on the roll of CTO at Loggly, and will acting as Chief Architect for the project. Jon hails from New Zealand, by way of Australia, and did stints at LookSmart, Technorati, Scout Labs, and recently his own startup, Minimal Loop. Jon is a search technology guru who is capable of accelerating search far beyond that of mere mortal men. If you don’t believe me, just try to go search for him on Google. Frankly, I have no clue how he does it. Trust me folks, a bunch of those those results are not him.

As for the rest of the team, Raffy is now officially our COO and Chief of Product (BTW, Swiss PMs rule), and I’ll be serving as our Geek CEO and Chief Evangelist. All three of us will be coding on Loggly over the next few months, and with a dash of a soon-to-be-announced designer, we’ll be assembling the framework needed for getting Loggly up and running in private beta over the coming months.
I’m really excited to be working with such smart and capable individuals. I can’t wait to start working with equally smart and capable partners and customers.
See you in your logs!
1 Comment
| Leave a Comment
|
Posted 9 Sep, 2009 by Kord Campbell
in Business and Startup
The first pass of the audio for the Loggly intro video was finished today. The clip will be used to further refine the wording of the script and start assembling the various visual elements we’ll need for shooting the video. What we should end up with is something similar to the works by Common Craft.
Not surprisingly, it took a TON of work to get the clip to its current state. Most of that work involved writing and editing the ‘script’ to sound like the actor was talking to an audience. I’m no script writer, so it took a fair amount of work with Brenda (the voice actor to whom I’m married) to nail down what sounded natural when spoken, and what didn’t. There are still some rough spots, but here’s the final version of the first draft if you want to take a listen. The script is here, less some final edits.
Back in the day, when John Leestma and I were involved in producing the Splunk Videos, we would do outtakes of the different developers doing their thing during filming. You had to do something to keep things entertaining – a 5 minute video would take all day to film and occupy 1/3 of the available conference rooms. It’s really too bad we didn’t publish them, some of them were side splitting.

Speaking of cracking up, the wife and I have always enjoyed a good laugh together. Here’s hoping you get a hoot from our silly outtakes. All your logs are belong to us!
No Comments
| Leave a Comment
|
Posted 21 Aug, 2009 by Kord Campbell
in Business and Startup
I bought a Baby Blue Bottle mic for my wife a few years back. She’s an opera singer, and can actually belt out some serious tunes when the mood strikes her. With her pipes and this mic we’ve gotten some amazing takes. If you’ve never seen/handled/heard a Blue mic, well, it’s a thing of joy to behold.

Now I’m working on Loggly fulltime, I realized we need a video which would explain the product in an easy to digest format, along the lines of the Twitter in Plain English video. It struck me I have all the bits I need to do this myself, including an amazing performer voice (albeit not using her operatic mode) and a kick ass microphone.
Videos are one of those things that help brand your identity, along with your blog posts, tweets, and of course, your product. The way in which you present your product to the people who use/consume it helps sear your identity forever in their minds. Case in point, Bluebottle Coffee. A friend of mine took me there a few weeks back and I was blown away by the preparation process. It was a thing of art really, much like the Blue Bottle Mic.

Maybe we should name ourselves Blue Bottle Logs. Whaddya think?
No Comments
| Leave a Comment
|
Posted 17 Aug, 2009 by raffy@loggly.com
in Business and Startup
Hello, we are Loggly, a startup located in San Francisco. As you can tell from our Web site, we are in the very early stages of our startup. Shoot, we haven’t even had a chance to let our designer take care of the Web site yet! So, tread with care – anything you find here is early stages!
1 Comment
| Leave a Comment
|