This group will explore emerging BIG DATA pipelines and discuss the potential for developing standard architectures, Application Programming Interfaces (APIs), and languages that will improve interoperability, enable security, and lower the overall cost of BIG DATA solutions.
The BIG DATA community group will also develop tools and methods that will enable: a) trust in BIG DATA solutions; b) standard techniques for operating on BIG DATA, and c) increased education and awareness of accuracy and uncertainties associated with applying emerging techniques to BIG DATA.
Note: Community Groups are proposed and run by the community. Although W3C hosts these
conversations, the groups do not necessarily represent the views of the W3C Membership or staff.
as I began following news on how Big Data may create opportunities for organizations I also wondered about how organizations might have to change in order to reach these opportunities.
My sense is that, assuming organizations have the technology in place to analyze big data, the most important thing is the ability of people to ask the right quesitons when interrogating data.
Then a question is: do you think organizations should empower employees more? In other words, should companies design ways for employees to advance their ideas as to which data analyses to pursue? For example: one employee may come out and say: I think that our customers are favoring product X of our competitor to ours…then an analysis of social networks could be carried out in order to see customer relative perception of the two products…
To conclude, there are several opportunities from using Big Data (are there?) and because managerial attention is limited, employees empowerment may increase the probability of getting value from data…
Faced with the growing importance of the omnichannel customer experience and the expertise required to understand the vision and technology behind data-driven marketing —digital marketing attribution, predictive modeling, dynamic digital profiles, mobile and so on—companies are testing a new position in the C-suite, the chief digital officer (CDO). But, what role does a CDO play, and why is this position (or something like it) so critical for your organization?
CDOs are digital-savvy, business-driven leaders who have what it takes to transform traditional businesses into data-driven companies. They combine marketing and management experience with technical know-how and strategic vision to align and improve business operations across the enterprise. I believe this type of broader, enterprise-wide data management scope has become the “mandate of our era.” And given that big data is here to stay – and getting bigger – your company needs a C-level position that specifically provides:
Technical expertise. As I discussed a few weeks ago, the big data hairball embodies both the promise and the threat behind big data and digital channels. A CDO can accelerate your efforts to unlock the data insights that increase sales and drive revenue growth.
Cross-functional finesse. Despite the CDO’s technical expertise, the primary responsibility of this role is not to make tech decisions. Instead, the CDO is charged with making decisions about how data and customers relate. Remember: Data analytics and the customer experience are not mutually exclusive. However, ingraining this fact in your organization will no doubt call for a shift in cultural mindset about data — what it is at your company, what it means to your business, and what you want it to do for you and for your customers’ experience of your brand.
Every department generates data and virtually every customer engagement leaves a digital trail of structured or unstructured information. Creating the systems and processes to capture, organize and leverage the data you’ve already got (and the additional data you know is coming) is the first step to aligning data use with your company’s business strategies. How else can you respond to the changing marketplace?
Big data is not just the amount of data, it’s more than that. Big data is a massive volume of both structured and unstructured data. It’s about the availability of data. The data is growing 50% or more each year. To make sense out of these large volumes of data, we use data mining techniques and special tools for analysis of big data. Big data is rapidly becoming the next frontier for new innovations and decision-making in industries, politics, public health sector and various other sectors.
Now, can we trust in Big Data solutions and/or results? Nowadays, are we running after the noise instead of signal? Big Data has its limits.Not just in public health but many other sectors systems collect large volumes of data and there are more chances for false positives. Without considering this fact the models that we create for decision-making or predictions will lead to overfitting or underfitting. Also, false positives challenge the credibility of a system. For better predictions and decision making
The more data will give more information at the same time there are chances for more false positives especially when you look for correlations in the data. More data gives you more witnesses but that doesn’t mean that you are closer to the truth. There are always chances of false positives. But this is when human intuitions will become useful.
So, I think to an extent, we can solve the false positives issue by the repetition of tests or analysis. Comparing data from other data sources using record linkage is another alternative. Integrating information from multiple data sources, interoperability between systems are all measures that needs to be taken for big data that is good data. Addressing the false positive issues will help to convert the big data into solutions.
We can use HBase as a storage back end for big-data.Can you think of any application areas or scenarios which may require semantic integration of different tables in HBase?(note that the data stored in HBase is not RDF triples and I’am not talking about using HBase as an RDF backend)
Does HBase fit into column-oriented database(datastore) definition of http://en.wikipedia.org/wiki/Column-oriented_DBMS.If it does; does it mean the keyvalue entries in HFile is stored in a column basis ie all keyvalues of one {column family:column entry} stored together followed ny next {column family:column entry}?
I have written a HBase scanner that converts HBase row entries into RDF.Can anybody suggest a streaming SPRAQL endpoint that can be used to answer SPARQL queries without actually storing the generated RDF representation?